Compositions and methods for nucleic acid expression and protein secretion in bacteroides

ABSTRACT

Provided are nucleic acids that include a promoter, where the promoter is operable in a Bacteroides cell and is operably linked to a heterologous nucleotide sequence of interest. Also provided are nucleic acids that include a promoter (operable in a prokaryotic cell such as a Bacteroides cell) operably linked to a sequence encoding a synthetic ribosomal binding site (RBS). Also provided are fusion proteins (and nucleic acids encoding them) in which a secreted Bacteroides polypeptide is fused to a heterologous polypeptide of interest. Also provided are prokaryotic cells (e.g., E. coli, a Bacteroides cell, and the like) that include one more nucleic acids such as those described above. Also provided are methods of expression in a prokaryotic cell, methods of detectably labeling a Bacteroides cell in an animal&#39;s gut, and methods of delivering a protein to an individual&#39;s gut.

CROSS-REFERENCE

This application is a Continuation of U.S. application Ser. No.16/094,694, filed Oct. 18, 2018, which is a National Stage Entry ofInternational Application No. PCT/US2017/028066, filed Apr. 18, 2017,which application claims the benefit of U.S. Provisional Application No.62/325,379, filed Apr. 20, 2016, each of which application isincorporated herein by reference.

GOVERNMENT RIGHTS

This invention was made with Government support under contracts OD006515and DK085025 awarded by the National Institutes of Health. TheGovernment has certain rights in the invention.

INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A SEQUENCELISTING XML

A Sequence Listing is provided herewith as a Sequence Listing XML,“STAN-1296CON_Seq_List_ST26.xml” created on Aug. 9, 2023 and having asize of 1,001,170 bytes. The contents of the Sequence Listing XML areincorporated by reference herein in their entirety.

INTRODUCTION

The human gastrointestinal tract is a highly evolved human-microbialinterface in which resident microbes are continually sensing andresponding to numerous biochemical cues. In addition to their nativerole in digestion, immune function, metabolism, and the nervous system,gut-resident bacteria have untapped potential to be engineered toconduct specific tasks, record events, and make decisions. Suchtechnology would benefit greatly from the development of genetic toolsfor manipulating members of the microbiota. Creation and implementationof such a toolkit would vastly expand the array of questions about thegut microbiota that can be experimentally addressed, and provide afoundation for engineering diagnostic or therapeutic microbes. There isa need in the art for genetic tools for abundant gut bacterial species.

While great advances have been made with genetic manipulation ofproteobacteria, particularly E. coli, this taxon is typically not aprominent component of the healthy human adult microbiota. TheBacteroides, the most abundant genus within the US American gut, arecapable of utilizing both dietary and host-derived nutrient sources, andare known to have an important role in immune development. Although sometools are available for genetic manipulation and expression inBacteroides, the strongest promoters identified to date areinsufficient, e.g., for microscopic imaging of fluorescent proteinexpression.

There is a need in the art for compositions and methods for reliablenucleic acid expression (generation of RNA and protein from DNA) inprokaryotes (e.g., Bacteroides). The present disclosure provides suchmethods and compositions (e.g., nucleic acids, expression vectors).

SUMMARY

Compositions and methods are provided for the expression of nucleicacids. For example, provided are nucleic acids that include a promoteroperably linked to a heterologous nucleotide sequence of interest (e.g.,an insertion sequence such as a multiple cloning site, a heterologousnucleic acid sequence, such as a transgene, e.g., a selectable marker, areporter, a therapeutic polypeptide, and the like), where the promoteris operable in a Bacteroides cell. Also provided are nucleic acids thatincludes a promoter (operable in a prokaryotic cell such as aBacteroides cell) operably linked to: (i) a sequence encoding asynthetic ribosomal binding site (RBS) and (ii) nucleotide sequence ofinterest. Also provided are fusion proteins (and nucleic acids encodingthem) in which a secreted Bacteroides polypeptide is fused to aheterologous polypeptide of interest. Also provided are prokaryoticcells (e.g., E. coli, a Bacteroides cell, and the like) that includenucleic acids such as those described above.

Provided are methods of expressing a transgene in a prokaryotic cell(e.g., using a subject nucleic acid), methods of detectably labeling aBacteroides cell in an animal's gut (e.g., labeling Bacteroides cellsthat are distinguishable from one another), and methods of delivering aprotein to an individual's gut, where such methods can be employed asmethods of treatment.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is best understood from the following detailed descriptionwhen read in conjunction with the accompanying drawings. The patent orapplication file contains at least one drawing executed in color. Copiesof this patent or patent application publication with color drawing(s)will be provided by the Office upon request and payment of the necessaryfee. It is emphasized that, according to common practice, the variousfeatures of the drawings are not to-scale. On the contrary, thedimensions of the various features are arbitrarily expanded or reducedfor clarity. Included in the drawings are the following figures.

FIG. 1 : Schematic of the high-throughput cloning and genomicintegration pipeline for Bacteroides using 96-well compatible liquidhandling steps. The pipeline was applied to 54 specifically designedgenomically integrated cassettes across four Bacteroides species,resulting in more than 99% correct plasmid assembly.

FIG. 2 a-2 f . Identification of a phage promoter capable of highprotein expression. (FIG. 2 a ) Native Bt promoters expected to givehigh protein expression from literature (P_(rRNA) and P_(BT1311)) andtranscriptomics data (P_(BT1830) and P_(BT4615)) were compared to aphage promoter (P_(BfP1E6)) (SEQ ID NO: 8) via fluorescence from GFPexpression. The RBS used was either the strongest RBS experimentallyidentified from a 192 member RBS library for each promoter (black bars)or the strongest RBS from the P_(BfP1E6) RBS library (grey bars). (FIG.2 b ) Fitness of the high expression Bt strain, with P_(BfP1E6) drivenGFP, was tested in competition against a non-expressing strain showingonly a minor fitness defect and stable colonization over a 10-weekperiod in gnotobiotic mice. (FIG. 2 c ) The bi-colonized mouse from FIG.2 b with the median ratio of expressing and non-expressing strains wasselected. Imaging of the distal colon demonstrates that the endogenousfluorescence from the GFP expressing portion of the population wassufficient for detection in vivo. Host tissue (lower left) was borderedby phalloidin stain of actin (red), and luminal contents contained bothexpressing (green) and non-expressing (white; DAPI only) Bt. (FIG. 2 d )The fluorescence of 214 Bt strains, each containing a mutation in theP_(BfP1E6) promoter was compared to the P_(BfP1E6) level. The x-axisrepresents the position of each mutation and diamonds, circles,triangles and squares represent a mutation to the residue A, C, G or T,respectively, with the average mutant value at each position traced ingrey. The previously characterized −7 and −33 motifs are highlighted inblue and the putative UP-element motifs revealed here are highlighted inred. (FIG. 2 e ) Constitutive promoters derived from P_(BfP1E6) werecompared via luciferase expression dependent luminescence relative toP_(BfP1E6). (FIG. 2 f ) Different RBSs under P_(BfP1E6)-drivenluciferase were compared. RBS1 (sr1) was rationally designed for weakexpression and RBS2-8 (sr2-8) were selected from the A/T rich RBSlibrary. Error bars represent the 95% confidence interval for replicatesof at least 3 independent experiments (FIG. 2 a , FIG. 2 d , FIG. 2 e ,FIG. 2 f ) or 5 different mice (FIG. 2 b ).

FIG. 3 a-3 f . Phage promoter set can predictably tune proteinexpression across the Bacteroides genus, allowing simultaneous strainidentification in vivo. (FIG. 3 a ) Luminescence was measured from 56promoter-RBS combinations (all possible FIG. 2 e-f combinations,excluding the weakest promoter) driving luciferase expression in each offour species: Bt, Bv, Bo and Bu. Measured luminescence is plottedagainst expected luminescence calculated by multiplying relativepromoter and RBS strengths in Bt (FIG. 2 e-f ). Individual strainsmeasurements, a linear fit of log₁₀ values, and associated R² arecolored by species: Bt (blue), Bv (red), Bo (green) and Bu (purple).(FIG. 3 b ) A unique combination of one of three GFP expression levelsor two mCherry expression levels were encoded in each of six Bacteroidesspecies. Independently measured single-cell fluorescence profilesrepresenting 95% of the cells for each species, as determined bymicroscopy of mid-log cultures, are plotted with the associated specieslabel. (FIG. 3 c ) The single-cell fluorescence profile from imaging thesix member community in the distal colon is shown. (FIG. 3 d ) Arepresentative transformed image of the six-member community within thedistal colon is shown. Each pixel was independently transformed tobetter display log-separated GFP intensity and showed clearlydistinguished cells for all six species (blue Be; cyan Bo; green Bt; redBf; orange Bu; yellow By). Pixels near transformation thresholds arecolored in grey, few ambiguous cells are present. (FIG. 3 e ) A largertransformed image, used for FIG. 3 d , shows the six Bacteroides specieslocalization relative to host nuclei (blue near bottom of image),actin-delineated epithelial boundary (white) and mucus (purple). Thesmaller image in FIG. 3 d is outlined with a dashed white box. (FIG. 3 f) An image from the six member community shows more clonal Bacteroidespopulation distributions within ingested plant material (plant cellwalls in purple) in the distal colon. Bo (cyan) predominates in thisimage, while populations of Bt, Bu and Bv can also be seen.

FIG. 4 . Golden Gate assembly schematic for pNBU2 based plasmids. Thejunctions used in BsaI assembly of expression cassettes are capitalized.The split ampicillin resistance gene only functions when reassembled,thus eliminating carry through of undigested parts. BsmBI can besubsequently used for assembling multi-cassettes integration plasmids.

FIG. 5 a-5 b . Comparison of three GFP expression distributions acrossstrains generated using different RBS libraries. (FIG. 5 a ) Bt strainswith GFP expression driven by P_(rRNA) with RBS sequences from one ofthree different RBS libraries: an A/G-rich degenerate sequence,N₁₁R₇N₂cgtaaATG (SEQ ID NO: 373), an unbiased degenerate sequence,N₂₀cgtaaATG (SEQ ID NO: 374), and an A/T-rich sequenceN₉W₃A₃W₂tWaNaataATG (SEQ ID NO: 375). For each library 192 colonies werescreened for GFP fluorescence. Most readings were close to backgroundautofluorescence, 1 au. The fluorescence readings from the strains ofthe A/T-rich RBS library was significantly higher than from the strainsof the A/G-rich or unbiased degenerate RBS libraries, (P=2×10⁻¹⁴ and4×10⁻⁹, respectively, Student's t-test). When repeated in triplicate,the highest expression strain from the A/T-rich library producedfluorescence at 1.4 au. (FIG. 5 b ) RBS libraries were generatedsimilarly for P_(BT1763)-driven GFP expression and at least 72 coloniesfor each library were screened. Similar to PrRNA, the A/T-rich librariesproduce a populations with higher fluorescent expression than the othertwo libraries (P<2×10⁻⁶). Additionally, the fluorescence readings fromthe strains of the A/G rich RBS library were significantly weaker thanthose of the unbiased degenerate RBS library (P=4×10⁻⁵).

FIG. 6 . Influence of phage promoter length on protein expression. Thephage promoter length, in base-pairs, used to drive GFP expression isindicated with positions relative to the translation start site. Errorbars represent the 95% confidence interval from 3 biological replicates.

FIG. 7 . P_(BfP1E6)-driven GFP fluorescence from a single genomic copywas visible by eye. A cell pellet from non-GFP expressing Bt (left) wascompared to a pellet with Bt harboring a P_(BfP1E6)-driven GFPexpression construct (right) suspended over a UV box. The image isunprocessed.

FIG. 8 . In vitro fitness assay of GFP-expressing Bt. Bt withP_(BfP1E6)-driven GFP expression was mixed 1:1 with a non-expressing Btstrain, grown anaerobically in TYG medium and passaged twice per day at1:50 and 1:100 dilution, giving a product of the dilutions of 1.6×10⁻¹⁰(˜33 doublings) at day 4. Each day duplicate cultures at mid-log phasewere assayed for bulk fluorescence (relative to 100% GFP positive andnegative cultures). Error bars represent the 95% confidence intervalfrom 2 independent biological replicates.

FIG. 9 a-9 e . Demonstration of the method for quantifying GFP positivecells from FIG. 2 b-2 c . (FIG. 9 a ) A 203×203 μm confocal image wastaken of a distal colon section with endogenous GFP fluorescence andstaining with DAPI for host nuclei and bacteria (white) and phalloidinfor the host epithelial boundary (red). Dietary material also fluorescesstrongly in the DAPI channel and can be distinguished from bacteria byits large size. (FIG. 9 b ) In an expanded portion of FIG. 9 arepresented by the magenta dashed box, bacteria with only DAPI (white)or DAPI and GFP fluorescence can be seen. (FIG. 9 c ) In ImageJ (NIH),the deconvolved DAPI image is thresholded to generate a mask ofindividual objects of bacterial cell size. (FIG. 9 d ) The GFP channelis used to quantify the average fluorescent intensity for each objectdelineated in FIG. 9 c . (FIG. 9 e ) A histogram of the fluorescencevalue of single cells demonstrates a large separation betweennon-fluorescing (black bars), most of which are below 1 au, andfluorescing (green bars) cells, most of which are above 20 au. Objectsof ambiguous intensities (grey bars) make up about 4% of objects.

FIG. 10 . Transcript abundance at various locations along the gut and indifferent growth phases in culture were compared for GFP driven byeither P_(BfP1E6) or P_(rRNA). RT-qPCR reading of promoter specifictranscript amplification, GFP, was normalized by 16S rRNA specific (notoverlapping with P_(rRNA)) transcript amplification. P_(BfP1E6)transcript measurements (left bars) varied by less than four-fold acrossall conditions, while P_(rRNA) measurements (right bars) varied by morethan 40-fold. Error bars represent the 95% confidence interval fromdifferent mice or biological replicates.

FIG. 11 a-11 d . (FIG. 11 a ) (SEQ ID NO: 512) The upstream regionimportant for phage promoter function is conserved in native Btpromoters. For each gene in the Bt genome, a candidate promoter sequencewas identified by the presence of the −7 conserved sequence, TAnnTTTGnnn(SEQ ID NO: 372), ending within 10 to 60 nucleotides of the start codonof the first gene in the operon (operons predicted bymicrobesonline.org). These criteria were met for 898 genes, which wereentered into the WebLogo 3 (http)//(weblogo) dot (threeplusone) dot(com) sequence logo creation software to illustrate the informationcontent of each residue. The −33 box reported to conserve the TTTGsequence is highlighted in blue and the upstream regions found to beimportant in the phage promoter mutational analysis are highlighted inred, with the sequence of the phage promoters aligned below the logo forreference. Despite the many misidentified putative promoters expected inthis simple analysis, the −33 region did appear to be conserved in thisdataset, and the −50 region appeared to be more highly conserved. (FIG.11 b ) A standard curve of luminescence produced from purified NanoLuc(Promega) luciferase protein is shown for estimating the absoluteprotein concentrations. The linear fit to the log 10 values and thecorresponding equation and R² is shown. (FIG. 11 c ) Luminescenceproduced by NanoLuc driven by the different phage promoters in FIG. 4Bwas measured concurrently with the standard curve and compared. Usingmeasured CFUs (5×10⁶ CFU/μL) and other estimates (see methods) thatcorresponded to a ˜0.5% cytoplasmic fraction of saturated culturevolume, the absolute cytoplasmic concentration of NanoLuc is estimatedfor each strain. (FIG. 11 d ) Relative expression from promotersP_BfP1E4, P_BfP5E4, P_BfP2E5, P_BfP4E5 and, P_BfP1E6 driving GFP (bottomline) or mCherry (top line) is compared to corresponding luciferaseexpression (dotted/center line).

FIG. 12 . The phage promoter set produced GFP expression matchingexpectation from characterization with luciferase. The strongest 6 phagepromoter variants from FIG. 2 e drove GFP expression in Bt (blue), Bv(red), Bo (green), Bf (purple) and Be (orange). GFP expression, relativeto P_(BfP1E6) in Bt, is plotted against luciferase expression relativeto P_(BfP1E6) in Bt (FIG. 2 e ). A linear fit of log 10 values of the 5strongest promoters, with the weakest promoter excluded due the highcontribution of background auto-fluorescence (0.8%), gave an R² of 0.92.

FIG. 13 a-13 b . Distal colon image (from FIG. 3 e ) prior to processingand transformation. (FIG. 13 a ) A three-channel image of the field ofview used for FIG. 3 e , shows DAPI (blue), sfGFP (green), UEAI-Alexa488for mucus (also in green), mCherry (red), and Phalloidin-Alexa594 foractin delineation of host epithelium (also in red). (FIG. 13 b ) Usinglinear unmixing on a Zeiss LSM 700 confocal microscope, the image wasseparated into 5 channels, DAPI (blue), sfGFP (green), UEAI-Alexa488(cyan), mCherry (red), and Phalloidin-Alexa594 (purple), while thebackground autofluorescent material was largely eliminated. This5-channel image was then transformed, to better visualize thelog-separated GFP values, to give FIG. 3 e.

FIG. 14 a-14 c . A control three-member community for estimating error.(FIG. 14 a ) A community of Bf, Bo and Bv was used to estimate error inidentifying member of the six-member community from FIG. 3 . Thesingle-cell fluorescence profiles from independent culture for thiscommunity were plotted with the associated species label (similar toFIG. 3 b ). (FIG. 14 b ) A germ-free mouse was colonized with the threemember community. An unprocessed image with high GFP gain, so thatintermediate GFP levels can be visualized, from the distal colon isshown. (FIG. 14 c ) The single cell fluorescent values of individualcells from the previous image, (FIG. 14 b ), clustered as expected, butwith larger deviations than seen in independent culture (FIG. 14 a ),due in part to difficulties in microscopy and image processingtechniques associated with imaging gut sections. Thresholds used todetermine species identity were used to quantify the number of cellsthat would be miscategorized (area in red) as the absent species (Bt, Buand Be) giving a 5.9% error rate.

FIG. 15 depicts Table 1, which shows the percentage of correctlyassembled, genomically integrated constructs for each species using thehigh-throughput cloning and conjugation protocol; and Table 2, whichshows a list of oligonucleotides used for RBS libraries. Top to bottomin Table 2: (SEQ ID NOs: 365-370).

FIG. 16 depicts Table 3, which shows a list of plasmids that were usedin the Examples section. The sequences for each listed construct are setforth as, from top to bottom, SEQ ID NOs: 94-148.

FIG. 17 a-17 d . Data related to diverse Bacteroides species engineeredto secrete peptides into extracellular space. (FIG. 17 a ) B.thetaiotaomicron cell culture pellet and filter sterilized supernatantwere analyzed via mass spectrometry proteomics and candidate secretedproteins were identified by abundance in the cell culture supernatant.(FIG. 17 b ) Protein product of BT0525 can direct a 6×His/3×FLAG peptideoutside the cell in six divergent Bacteroides species. Western blotanalysis of cell pellet (P) and culture supernatant (S) from mid-logcultures of Bacteroides species using a monoclonal anti-3×FLAG antibody.(FIG. 17 c ) Schematic of cargo peptides secreted via BT0525 using adesigned cleavable linker system to allow for release into extracellularspace. (FIG. 17 d ) Secreted cargo 6×His/3×FLAG peptide is released fromcarrier BT0525 upon addition of mouse cecal extract (CE) when fusionlinker is designed to be targeted by gut proteases. Western blot ofculture supernatant from B. thetaiotaomicron secreting fusion proteinswith either a non-cleavable or cleavable linker, exposed to either PBSor CE, using anti-3×FLAG monoclonal antibody.

FIG. 18 a -18 b. B. thetaiotaomicron secreting anti-inflammatorypeptides protect mice from DSS-induced colitis. (FIG. 18 a ) Gnotobioticmice colonized with a model three-member community of Edwardsiellatarda, Clostridium scindens, and Bacteroides vulgatus were given 5% DSSin drinking water. Mice that also received B. theta secretinganti-inflammatory peptides lost significantly less weight than mice thatwere untreated. (FIG. 18 b ) Disease Activity Index at time of sacrificewas significantly lower in mice that received B. thetaiotaomicronsecreting an effective anti-inflammatory protein than those receiving atripeptide only.

FIG. 19 depicts amino acid sequences of proteins found (during workdisclosed in the examples section herein) to be secreted from B.thetaiotaomicron cultures. The listed proteins are a non-limiting listof possible proteins that can be used as a secreted Bacteroides proteinthat is part of a subject secreted fusion protein (e.g., where apolypeptide of interest is fused to a secreted Bacteroides protein).

FIG. 20 depicts E. coli cells expressing a GFP transgene that isoperably linked to the promoter of SEQ ID NO: 388 (which is demonstratedhere to be operable in Bacteroides cells, and also in E. coli cells).S17-1 is a strain of E. coli used to conjugate plasmids over toBacteroides cells.

FIG. 21 depicts a sequence alignment of the promoters of Table 6.

FIG. 22 . Bt secretes proteins via OMVs. When secreted proteincandidates were cloned under constitutive expression with a 3×FLAG tagand cell pellet (P), cell-free culture supernatant (S), ultracentrifugedS to remove OMVs (U), and recovered OMVs (O) were analyzed via westernblot, protein products of BT1488 and BT3742 localized to OMVs (presenceof BT3742 in the ultracentrifuged supernatant is accounted for by lysis)while BT0525 localized mainly to the cell-free supernatant.

FIG. 23 Diverse species of Bacteroides secrete BT0525. Western blotanalysis of By, Bu, and Be strains expressing sfGFP and BT0525, eachunder P_(BfP1E6) and with a 3×FLAG tag. Cell pellets show expression ofboth proteins, while culture supernatants demonstrate secretion ofBT0525 independent of lysis. These three species of Bacteroides are ableto accumulate more BT0525 signal in the supernatant than Bt, Bf, or Bofor unknown reasons. This could be due to differential expression ofsecretion machinery, degradation machinery in the periplasm or at thecell membrane, or of proteases that are released extracellularly.

FIG. 24 a-24 f Colonization by Bt prevents crypt localization of anisogenic strain. (FIG. 24 a ) Fecal densities of sequentially introducedisogenic Bt strains with differing antibiotic resistance in conventionalmice by selective plating (erm, top line; tet, bottom line). (FIG. 24 b) Schematic for experiment in (FIG. 24C-FIG. 24F) in which germ-freemice are colonized with GFP- and RFP-expressing Bt strains either oneweek apart (bottom) or simultaneously (top). (FIG. 24 c ) The relativeabundance of GFP expressing Bt, relative to the total (GFP plus RFP) Btis quantified for lumen (grey bars) and crypt (black bars) for theco-colonized and sequentially colonized mice. Error bars represent the95% confidence interval for mice (n=3) in each group (* P<0.05, **P<0.01). (FIG. 24 d ) Image of luminal and crypt bacteria fromco-colonized mouse proximal colon. The lumen-epithelium interface isrepresented by the dashed white line. Scale bar, 10 315 μm. (FIG. 24 e )Representative crypt from simultaneous colonization. (FIG. 24 f )Representative crypt from sequential colonization.

DETAILED DESCRIPTION

Before the present methods and compositions are described, it is to beunderstood that this invention is not limited to particular method orcomposition described, as such may, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments only, and is not intended to belimiting, since the scope of the present invention will be limited onlyby the appended claims.

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimits of that range is also specifically disclosed. Each smaller rangebetween any stated value or intervening value in a stated range and anyother stated or intervening value in that stated range is encompassedwithin the invention. The upper and lower limits of these smaller rangesmay independently be included or excluded in the range, and each rangewhere either, neither or both limits are included in the smaller rangesis also encompassed within the invention, subject to any specificallyexcluded limit in the stated range. Where the stated range includes oneor both of the limits, ranges excluding either or both of those includedlimits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although any methods andmaterials similar or equivalent to those described herein can be used inthe practice or testing of the present invention, some potential andpreferred methods and materials are now described. All publicationsmentioned herein are incorporated herein by reference to disclose anddescribe the methods and/or materials in connection with which thepublications are cited. It is understood that the present disclosuresupercedes any disclosure of an incorporated publication to the extentthere is a contradiction.

As will be apparent to those of skill in the art upon reading thisdisclosure, each of the individual embodiments described and illustratedherein has discrete components and features which may be readilyseparated from or combined with the features of any of the other severalembodiments without departing from the scope or spirit of the presentinvention. Any recited method can be carried out in the order of eventsrecited or in any other order which is logically possible.

It must be noted that as used herein and in the appended claims, thesingular forms “a”, “an”, and “the” include plural referents unless thecontext clearly dictates otherwise. Thus, for example, reference to “acell” includes a plurality of such cells (e.g., a population of suchcells) and reference to “the protein” includes reference to one or moreproteins and equivalents thereof, e.g. polypeptides, known to thoseskilled in the art, and so forth.

The publications discussed herein are provided solely for theirdisclosure prior to the filing date of the present application. Nothingherein is to be construed as an admission that the present invention isnot entitled to antedate such publication by virtue of prior invention.Further, the dates of publication provided may be different from theactual publication dates which may need to be independently confirmed.

Definitions

By a “DNA molecule” it is meant the polymeric form ofdeoxyribonucleotides (adenine, guanine, thymine, or cytosine) in eithersingle stranded form or a double-stranded helix. This term refers onlyto the primary and secondary structure of the molecule, and does notlimit it to any particular tertiary forms. Thus, this term includesdouble-stranded DNA found, inter alia, in linear DNA molecules (e.g.,restriction fragments), viruses, plasmids, and chromosomes.

By a DNA “coding sequence” it is meant a DNA sequence which istranscribed and translated into a polypeptide when placed under thecontrol of appropriate regulatory sequences. The boundaries of thecoding sequence are determined by a start codon at the 5′ (amino)terminus and a translation stop codon at the 3′ (carboxyl) terminus. Atranscription termination sequence may be located 3′ to the codingsequence.

“DNA regulatory sequences”, as used herein, are transcriptional andtranslational control sequences, such as promoters, terminators,Ribosome binding sites (RBSs), and the like, that provide for and/orregulate expression of a coding sequence in a host cell.

In some embodiments, a subject nucleotide sequence (e.g., a promotersequence) is modified relative to a corresponding wild type sequence. A“corresponding wild type sequence” is the wild type (naturallyoccurring) sequence that has the highest identity with the sequence inquestion. Such a sequence will usually have a similar function as thesequence in question, but this is not necessarily the case. For example,a synthetic promoter sequence has at least one mutation relative to acorresponding wild type promoter sequence, and the corresponding wildtype promoter sequence is the wild type promoter sequence most similarto the synthetic sequence. Likewise, a synthetic RBS sequence has atleast one mutation relative to a corresponding wild type RBS sequence,and the corresponding wild type RBS sequence is the wild type RBSsequence most similar to the synthetic sequence. A “corresponding wildtype sequence” (e.g., nucleotide sequence, amino acid sequence) can beidentified at the nucleotide sequence level (and when the sequence codesfor a protein, the encoded amino acid sequence can also be evaluated)using any convenient method (e.g., using any convenient sequencecomparison/alignment software such as BLAST, etc.). Such methods will beknown and readily available to one of ordinary skill in the art.

The terms “peptide,” “polypeptide,” and “protein” are usedinterchangeably herein, and refer to a polymeric form of amino acids ofany length, which can include coded and non-coded amino acids,chemically or biochemically modified or derivatized amino acids, andpolypeptides having modified peptide backbones.

The terms “host”, “host cell” and “recombinant host cell” are usedinterchangeably herein to indicate a prokaryotic cell into which one ormore nucleic acids such as isolated and purified nucleic acids (e.g.,vectors) have been introduced. It is understood that such terms refernot only to the particular subject cell but also to the progeny orpotential progeny of such a cell. Because certain modifications mayoccur in succeeding generations due to either mutation or environmentalinfluences, such progeny may not, in fact, be identical to the parentcell, but are still included within the scope of the term as usedherein.

The term “isolated” and “purified nucleic acid” refers to the state inwhich a nucleic acid can be. In such a case, the nucleic acids will befree or substantially free of material with which they are naturallyassociated such as other nucleic acids with which they are found intheir natural environment, or the environment in which they are prepared(e.g. cell culture).

The terms “transformation”, “transformed” or “introducing a nucleic acidinto a host cell” denote any process wherein an extracellular nucleicacid like a vector, with or without accompanying material, enters a hostcell (e.g., a prokaryotic cell, a Bacteroides cell, an E. coli cell,etc.). The term “cell transformed” or “transformed cell” means the cellor its progeny into which the extracellular nucleic acid has beenintroduced and thus includes the extracellular nucleic acid. Theintroduced nucleic acid may or may not be integrated (covalently linked)into the genome of the cell. For example, in some cases, the introducednucleic acid integrates into the genome of the cell (as a chromosomalintegrant). In some cases, the introduced nucleic acid is maintained onan episomal element (extra chromosomal element) such as a plasmid.

Any convenient method can be used to introduce a nucleic acid into aprokaryotic cell, e.g., by electroporation (e.g., usingelectro-competent cells), by conjugation, by chemical methods (e.g.,using chemically competent cells), and the like.

The amino acids described herein are preferred to be in the “L” isomericform. The amino acid sequences are given in one-letter code (A: alanine;C: cysteine; D: aspartic acid; E: glutamic acid; F: phenylalanine; G:glycine; H: histidine; I: isoleucine; K: lysine; L: leucine; M:methionine; N: asparagine; P: proline; Q: glutamine; R: arginine; S:serine; T: threonine; V: valine; W: tryptophan; Y: tyrosine; X: anyresidue). In keeping with standard polypeptide nomenclature, NH2 refersto the free amino group present at the amino terminus (the N terminus)of a polypeptide, while COOH refers to the free carboxy group present atthe carboxy terminus (the C terminus) of a polypeptide.

General methods in molecular and cellular biochemistry can be found insuch standard textbooks as Molecular Cloning: A Laboratory Manual, 3rdEd. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols inMolecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); NonviralVectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); andViral Vectors (Kaplift & Loewy eds., Academic Press 1995); thedisclosures of which are incorporated herein by reference. Reagents,cloning vectors, and kits for genetic manipulation referred to in thisdisclosure are in many cases available from commercial vendors such asBioRad, Stratagene, Invitrogen, Sigma-Aldrich, and ClonTech.

Compositions

Provided are nucleic acids (e.g., expression vectors) that include apromoter operably linked to a nucleotide sequence of interest. In somecases, a subject promoter is operable (functional) in a prokaryotic cell(e.g., a Bacteroides cell). Also provided are prokaryotic cells such asBacteroides cells.

Bacteroides Cells

The term “Bacteroides cell” is used herein to refer to a cell of thegenus Bacteroides (e.g., when referencing cells in which a subjectpromoter is operable, in the context of a cell that includes a subjectnucleic acid, in the context of a subject method, and the like).Likewise, the term “Bacteroides phage” refers to a phage that ‘infects’a Bacteroides cell (i.e., a phage that infects a cell of the genusBacteroides).

As such, in some cases, a subject cell is a Bacteroides cell. In somecases, a subject promoter is operable in a Bacteroides cell. In somecases, a subject promoter is (or is derived from) a Bacteroides phagepromoter. In some cases, a subject cell (e.g., a cell of a subjectmethod, a cell that includes a subject nucleic acid, a cell in which asubject promoter is operable, and the like) is a Bacteroides cell.Examples of species within the genus Bacteroides include but are notlimited to: B. fragilis (Bf), B. distasonis (Bd), B. thetaiotaomicron(Bt), B. vulgatus (By), B. ovatus (Bo), B. eggerrthii (Be), B. merdae(Bm), B. stercoris (Bs), B. uniformis (Bu), and B. caccae (Bc).

In some cases, a subject Bacteroides cell (e.g., a cell of a subjectmethod, a cell that includes a subject nucleic acid, a cell in which asubject promoter is operable, and the like) is a species selected from:B. fragilis (Bf), B. thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus(Bo), and B. uniformis (Bu). In some cases, the Bacteroides cell is aspecies selected from: B. fragilis (Bf), B. distasonis (Bd), B.thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus (Bo), B. eggerrthii(Be), B. merdae (Bm), B. stercoris (Bs), B. uniformis (Bu), and B.caccae (Bc). In some cases, the Bacteroides cell is a species selectedfrom: B. fragilis (Bf), B. thetaiotaomicron (Bt), B. vulgatus (By), B.ovatus (Bo), and B. uniformis (Bu). In some cases, the Bacteroides cellis a species selected from: B. thetaiotaomicron (Bt), B. vulgatus (Bv),B. ovatus (Bo), and B. uniformis (Bu).

Promoter

As noted above, provided are nucleic acids (e.g., expression vectors)that include a promoter operably linked to a nucleotide sequence ofinterest. As used herein, a “promoter” or “promoter sequence” is a DNAregulatory region capable of recruiting RNA polymerase in a cell andinitiating transcription of a downstream (3′ direction) sequence. Thus,a promoter is nucleic acid sequence sufficient to direct transcriptionof a nucleic acid sequence to which it is operably linked.

The promoter of a subject nucleic acid is operable in a Bacteroidescell. When a promoter is operable in a Bacteroides cell, the promoter isfunctional in a cell of the genus Bacteroides. Because some promoterscan be operable in more than one type of cell, a phrase such as“operable in a Bacteroides cell” or “operable in Bacteroides cells” isnot limiting in the sense that it does not mean that such a promoter isnot operable in other cell types (i.e., it does not mean that thepromoter is not functional in other prokaryotic cells). For example, apromoter that is operable in Bacteroides cells may also be operable inother types of prokaryotic cells (e.g., E. coli cells) (e.g., see FIG.20 ). Thus, in some cases, a subject promoter, in addition to beingoperable in Bacteroides cells, is also operable in non-Bacteroides cells(e.g., prokaryotic cells such as E. coli cells).

In some cases, a subject promoter is operable in a Bacteroides cell(e.g., B. fragilis (Bf), B. distasonis (Bd), B. thetaiotaomicron (Bt),B. vulgatus (By), B. ovatus (Bo), B. eggerrthii (Be), B. merdae (Bm), B.stercoris (Bs), B. uniformis (Bu), B. caccae (Bc), and the like). Insome cases, a subject promoter is operable in a Bacteroides cellselected from: B. fragilis (Bf), B. distasonis (Bd), B. thetaiotaomicron(Bt), B. vulgatus (By), B. ovatus (Bo), B. eggerrthii (Be), B. merdae(Bm), B. stercoris (Bs), B. uniformis (Bu), and B. caccae (Bc). In somecases, a subject promoter is operable in a Bacteroides cell selectedfrom: B. fragilis (Bf), B. thetaiotaomicron (Bt), B. vulgatus (By), B.ovatus (Bo), and B. uniformis (Bu). In some cases, a subject promoter isoperable in a Bacteroides cell selected from: B. thetaiotaomicron (Bt),B. vulgatus (By), B. ovatus (Bo), and B. uniformis (Bu). In some cases,a subject promoter is operable in prokaryotic cells (e.g., Bacteroidescells, E. coli, etc.). In some cases, a subject promoter is operable inE. coli.

In some embodiments, a promoter of a subject nucleic acid includes anucleotide sequence of a wild type (i.e., naturally occurring) promoterfrom a phage (e.g., a Bacteroides phage, i.e., a phage that infectsBacteroides cells). For example, in some cases, a promoter of a subjectnucleic acid includes the Bacteroides phage promoter sequence set forthin any of SEQ ID NOs: 8, 388-397, and 405-407. In some cases, a promoterof a subject nucleic acid includes the Bacteroides phage promotersequence set forth in any of SEQ ID NOs: 388 and 407. In some cases, apromoter of a subject nucleic acid includes the Bacteroides phagepromoter sequence set forth in SEQ ID NO: 8. In some cases, a promoterof a subject nucleic acid includes the Bacteroides phage promotersequence set forth in SEQ ID NO: 388. In some cases, a promoter of asubject nucleic acid includes the Bacteroides phage promoter sequenceset forth in SEQ ID NO: 406. In some cases, a promoter of a subjectnucleic acid includes the Bacteroides phage promoter sequence set forthin SEQ ID NO: 407. In some cases, a promoter of a subject nucleic acidis a synthetic promoter (i.e., not naturally occurring, e.g., a sequencethat has at least one mutation relative to a corresponding wild typepromoter sequence).

As described below in the examples section, the inventors have isolatedat least two wild type phage promoter sequences, performed mutagenesisand truncation experiments, and performed sequence alignments toidentify positions within the promoter sequences that account forcontrolling expression of an operably linked nucleotide sequence ofinterest. For example, in some embodiments, a promoter of a subjectnucleic acid includes the nucleotide sequence: GTTAA (n)_(x1) GTTAA(n)_(x2)TA (n)₂ TTTG (SEQ ID NO: 400), where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6); and    -   (2) x2 can be an integer in a range of from 36-38 (e.g., in some        cases x2 is 37). In some cases, x1 is an integer in a range of        from 3-7 and x2 is an integer in a range of from 36-38.        In some cases, x1 is an integer in a range of from 4-6 and x2 is        in integer in a range of from 36-38. In some cases, x1 is an        integer in a range of from 3-7 and x2 is 37. In some cases, x1        is an integer in a range of from 4-6 and x2 is 37.

In some embodiments, a promoter of a subject nucleic acid includes anucleotide sequence having 80% or more identity (e.g., 85% or more, 90%or more, 95% or more, or 100% identity) with the nucleotide sequence:GTTAA (n)_(x1) GTTAA (n)_(x2) TA (n)₂ TTTG (SEQ ID NO: 400) (where thepercent identity is calculated using only the defined nucleotides of thesequence set forth in SEQ ID NO: 400), where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6); and    -   (2) x2 can be an integer in a range of from 36-38 (e.g., in some        cases x2 is 37). In some cases, x1 is an integer in a range of        from 3-7 and x2 is an integer in a range of from 36-38.        In some cases, x1 is an integer in a range of from 4-6 and x2 is        in integer in a range of from 36-38. In some cases, x1 is an        integer in a range of from 3-7 and x2 is 37. In some cases, x1        is an integer in a range of from 4-6 and x2 is 37.

In some embodiments, a promoter of a subject nucleic acid includes thenucleotide sequence: GTTAA (n)_(x1) GTTAA (n)_(x2) TA (n)₂ TTTG (n)_(x3)GAA (SEQ ID NO: 401), where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 36-38 (e.g., in some        cases x2 is 37); and    -   (3) x3 can be an integer in a range of from 4-12 (e.g., in some        cases x3 is an integer in a range of from 7-11, in some cases x3        is an integer in a range of from 4-7, in some cases x3 is an        integer in a range of from 6-8, in some cases x3 is 7, in some        cases x3 is 11).        In some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 36-38, and x3 is an integer in a        range of from 4-12. In some cases, x1 is an integer in a range        of from 3-7, x2 is an integer in a range of from 36-38, and x3        is an integer in a range of from 4-7. In some cases, x1 is an        integer in a range of from 3-7, x2 is an integer in a range of        from 36-38, and x3 is an integer in a range of from 7-11. In        some cases, x1 is an integer in a range of from 4-6, x2 is in        integer in a range of from 36-38, and x3 is an integer in a        range of from 6-8. In some cases, x1 is an integer in a range of        from 4-6, x2 is in integer in a range of from 36-38, and x3        is 7. In some cases, x1 is an integer in a range of from 3-7, x2        is 37, and x3 is 7. In some cases, x1 is an integer in a range        of from 4-6, x2 is 37, and x3 is 7. In some cases, x1 is an        integer in a range of from 3-7, x2 is 37, and x3 is an integer        in a range of from 7-11. In some cases, x1 is an integer in a        range of from 4-6, x2 is 37, and x3 is an integer in a range of        from 7-11.

In some embodiments, a promoter of a subject nucleic acid includes anucleotide sequence having 80% or more identity (e.g., 85% or more, 90%or more, 95% or more, or 100% identity) with the nucleotide sequence:GTTAA (n)_(x1) GTTAA (n)_(x2) TA (n)₂ TTTG (n)_(x3) GAA (SEQ ID NO: 401)(where the percent identity is calculated using only the definednucleotides of the sequence set forth in SEQ ID NO: 401), where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 36-38 (e.g., in some        cases x2 is 37); and    -   (3) x3 can be an integer in a range of from 4-12 (e.g., in some        cases x3 is an integer in a range of from 7-11, in some cases x3        is an integer in a range of from 4-7, in some cases x3 is an        integer in a range of from 6-8, in some cases x3 is 7, in some        cases x3 is 11).        In some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 36-38, and x3 is an integer in a        range of from 4-12. In some cases, x1 is an integer in a range        of from 3-7, x2 is an integer in a range of from 36-38, and x3        is an integer in a range of from 4-7. In some cases, x1 is an        integer in a range of from 3-7, x2 is an integer in a range of        from 36-38, and x3 is an integer in a range of from 7-11. In        some cases, x1 is an integer in a range of from 4-6, x2 is in        integer in a range of from 36-38, and x3 is an integer in a        range of from 6-8. In some cases, x1 is an integer in a range of        from 4-6, x2 is in integer in a range of from 36-38, and x3        is 7. In some cases, x1 is an integer in a range of from 3-7, x2        is 37, and x3 is 7. In some cases, x1 is an integer in a range        of from 4-6, x2 is 37, and x3 is 7. In some cases, x1 is an        integer in a range of from 3-7, x2 is 37, and x3 is an integer        in a range of from 7-11. In some cases, x1 is an integer in a        range of from 4-6, x2 is 37, and x3 is an integer in a range of        from 7-11.

In some embodiments, a promoter of a subject nucleic acid includes thenucleotide sequence: GTTAA (n)_(x1) GTTAA (n)_(x2) TTG (n)_(x3) TA (n)₂TTTG (SEQ ID NO: 402) where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 14-16 (e.g., in some        cases x2 is 15); and    -   (3) x3 can be an integer in a range of from 18-20 (e.g., in some        cases x3 is 19).        In some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 14-16, and x3 is an integer in a        range of from 18-20. In some cases, x1 is an integer in a range        of from 4-6, x2 is an integer in a range of from 14-16, and x3        is an integer in a range of from 18-20. In some cases, x1 is an        integer in a range of from 3-7, x2 is 15, and x3 is 19. In some        cases, x1 is an integer in a range of from 4-6, x2 is 15, and x3        is 19.

In some embodiments, a promoter of a subject nucleic acid includes anucleotide sequence having 80% or more identity (e.g., 85% or more, 90%or more, 95% or more, or 100% identity) with the nucleotide sequence:GTTAA (n)_(x1) GTTAA (n)_(x2) TTG (n)_(x3) TA (n)₂ TTTG (SEQ ID NO: 402)(where the percent identity is calculated using only the definednucleotides of the sequence set forth in SEQ ID NO: 402), where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 14-16 (e.g., in some        cases x2 is 15); and    -   (3) x3 can be an integer in a range of from 18-20 (e.g., in some        cases x3 is 19). In some cases, x1 is an integer in a range of        from 3-7, x2 is an integer in a range of from 14-16, and x3 is        an integer in a range of from 18-20.        In some cases, x1 is an integer in a range of from 4-6, x2 is an        integer in a range of from 14-16, and x3 is an integer in a        range of from 18-20. In some cases, x1 is an integer in a range        of from 3-7, x2 is 15, and x3 is 19. In some cases, x1 is an        integer in a range of from 4-6, x2 is 15, and x3 is 19.

In some embodiments, a promoter of a subject nucleic acid includes thenucleotide sequence: GTTAA (n)_(x1) GTTAAA (n)_(x2) TTG (n)_(x3) TA (n)₂TTTG (SEQ ID NO: 404) where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 14-16 (e.g., in some        cases x2 is 15); and    -   (3) x3 can be an integer in a range of from 18-20 (e.g., in some        cases x3 is 19).        In some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 14-16, and x3 is an integer in a        range of from 18-20. In some cases, x1 is an integer in a range        of from 4-6, x2 is an integer in a range of from 14-16, and x3        is an integer in a range of from 18-20. In some cases, x1 is an        integer in a range of from 3-7, x2 is 15, and x3 is 19. In some        cases, x1 is an integer in a range of from 4-6, x2 is 15, and x3        is 19.

In some embodiments, a promoter of a subject nucleic acid includes anucleotide sequence having 80% or more identity (e.g., 85% or more, 90%or more, 95% or more, or 100% identity) with the nucleotide sequence:GTTAA (n)_(x1) GTTAAA (n)_(x2) TTG (n)_(x3) TA (n)₂ TTTG (SEQ ID NO:404) (where the percent identity is calculated using only the definednucleotides of the sequence set forth in SEQ ID NO: 404), where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 14-16 (e.g., in some        cases x2 is 15); and    -   (3) x3 can be an integer in a range of from 18-20 (e.g., in some        cases x3 is 19). In some cases, x1 is an integer in a range of        from 3-7, x2 is an integer in a range of from 14-16, and x3 is        an integer in a range of from 18-20.        In some cases, x1 is an integer in a range of from 4-6, x2 is an        integer in a range of from 14-16, and x3 is an integer in a        range of from 18-20. In some cases, x1 is an integer in a range        of from 3-7, x2 is 15, and x3 is 19. In some cases, x1 is an        integer in a range of from 4-6, x2 is 15, and x3 is 19.

In some embodiments, a promoter of a subject nucleic acid includes thenucleotide sequence: GTTAA (n)_(x1) GTTAA (n)_(x2) TTG (n)_(x3) TA (n)₂TTTG (n)_(x4) GAA (SEQ ID NO: 403) where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 14-16 (e.g., in some        cases x2 is 15);    -   (3) x3 can be an integer in a range of from 18-20 (e.g., in some        cases x3 is 19); and    -   (4) x4 can be an integer in a range of from 4-12 (e.g., in some        cases x4 is an integer in a range of from 7-11, in some cases x4        is an integer in a range of from 4-7, in some cases x4 is an        integer in a range of from 6-8, in some cases x4 is 7, in some        cases x4 is 11).        In some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 14-16, x3 is an integer in a range of        from 18-20, and x4 is an integer in a range of from 4-12. In        some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 14-16, x3 is an integer in a range of        from 18-20, and x4 is an integer in a range of from 4-7. In some        cases, x1 is an integer in a range of from 3-7, x2 is an integer        in a range of from 14-16, x3 is an integer in a range of from        18-20, and x4 is an integer in a range of from 7-11. In some        cases, x1 is an integer in a range of from 4-6, x2 is an integer        in a range of from 14-16, x3 is an integer in a range of from        18-20, and x4 is an integer in a range of from 6-8. In some        cases, x1 is an integer in a range of from 4-6, x2 is an integer        in a range of from 14-16, x3 is an integer in a range of from        18-20, and x4 is 7. In some cases, x1 is an integer in a range        of from 3-7, x2 is 15, x3 is 19, and x4 is an integer in a range        of from 7-11. In some cases, x1 is an integer in a range of from        3-7, x2 is 15, x3 is 19, and x4 is 7. In some cases, x1 is an        integer in a range of from 4-6, x2 is 15, x3 is 19, and x4 is an        integer in a range of from 7-11. In some cases, x1 is an integer        in a range of from 4-6, x2 is 15, x3 is 19, and x4 is 7.

In some embodiments, a promoter of a subject nucleic acid includes anucleotide sequence having 80% or more identity (e.g., 85% or more, 90%or more, 95% or more, or 100% identity) with the nucleotide sequence:GTTAA (n)_(x1) GTTAA (n)_(x2) TTG (n)_(x3) TA (n)₂ TTTG (n)_(x4) GAA(SEQ ID NO: 403) (where the percent identity is calculated using onlythe defined nucleotides of the sequence set forth in SEQ ID NO: 403),where:

-   -   (1) x1 can be an integer in a range of from 3-7 (e.g., in some        cases x1 is an integer in a range of from 4-6, in some cases x1        is 4, and in some cases, x1 is 6);    -   (2) x2 can be an integer in a range of from 14-16 (e.g., in some        cases x2 is 15);    -   (3) x3 can be an integer in a range of from 18-20 (e.g., in some        cases x3 is 19); and    -   (4) x4 can be an integer in a range of from 4-12 (e.g., in some        cases x4 is an integer in a range of from 7-11, in some cases x4        is an integer in a range of from 4-7, in some cases x4 is an        integer in a range of from 6-8, in some cases x4 is 7, in some        cases x4 is 11).        In some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 14-16, x3 is an integer in a range of        from 18-20, and x4 is an integer in a range of from 4-12. In        some cases, x1 is an integer in a range of from 3-7, x2 is an        integer in a range of from 14-16, x3 is an integer in a range of        from 18-20, and x4 is an integer in a range of from 4-7. In some        cases, x1 is an integer in a range of from 3-7, x2 is an integer        in a range of from 14-16, x3 is an integer in a range of from        18-20, and x4 is an integer in a range of from 7-11. In some        cases, x1 is an integer in a range of from 4-6, x2 is an integer        in a range of from 14-16, x3 is an integer in a range of from        18-20, and x4 is an integer in a range of from 6-8. In some        cases, x1 is an integer in a range of from 4-6, x2 is an integer        in a range of from 14-16, x3 is an integer in a range of from        18-20, and x4 is 7. In some cases, x1 is an integer in a range        of from 3-7, x2 is 15, x3 is 19, and x4 is an integer in a range        of from 7-11. In some cases, x1 is an integer in a range of from        3-7, x2 is 15, x3 is 19, and x4 is 7. In some cases, x1 is an        integer in a range of from 4-6, x2 is 15, x3 is 19, and x4 is an        integer in a range of from 7-11. In some cases, x1 is an integer        in a range of from 4-6, x2 is 15, x3 is 19, and x4 is 7.

In some embodiments, a promoter of a subject nucleic acid includes anucleotide sequence of the group of nucleotide sequences presented inTable 13, wherein “n” represents a nucleotide that is independentlyselected from A, C, G, and T. In some embodiments, a promoter of asubject nucleic acid may include a nucleotide sequence having 80% ormore identity to a nucleotide sequence presented in Table 13, whereinthe percent identity is calculated using only the defined nucleotides.In some cases, the promoter may include a nucleotide sequences having85% or more, 90% or more, 95% or more, or 100% identity to a nucleotidesequence presented in Table 13.

TABLE 13 Consensus promoter sequences of the disclosure.Consensus Sequence   GTTAA (n)₄₋₇ GTTAA (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂TTTG (SEQ ID NO: 492) GTTAA (n)₄₋₈ GTTAA (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂TTTG (SEQ ID NO: 493) GTTAA (n)₃₋₇ GTTAA (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂TTTG (SEQ ID NO: 494) GTTAA (n)₄₋₇ GTTAA (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂TTTGC (SEQ ID NO: 495) GTTAA (n)₃₋₇ GTTAA (n)₃₆₋₃₈ TA (n)₂ TTTG (SEQ IDNO: 496) GTTAA (n)₄₋₇ GTTAA (n)₃₆₋₃₈ TA (n)₂ TTTG (SEQ ID NO: 497)GTTAA (n)₄₋₇ GTTAA (n)₃₄₋₃₈ TA (n)₂ TTTG (SEQ ID NO: 498)GTTAA (n)₄₋₇ GTTAA (n)₃₆₋₃₉ TA (n)₂ TTTG (SEQ ID NO: 499)GTTAA (n)₄₋₇ GTTAA (n)₃₆₋₃₉ TA (n)₂ TTTGC (SEQ ID NO: 500)GTTAA (n)₀₋₂₀ GTTAA (n)₁₀₋₆₀ TA (n)₀₋₁₀ TTTG (SEQ ID NO: 501)TTAA (n)₀₋₁₀ TTAA (n)₃₀₋₅₀ TA (n)₂ TTTG (SEQ ID NO: 502)GTTAA (n)₄₋₇ GTTAA (SEQ ID NO: 503) GTTAA (n)₄₈₋₅₄ TTTG (SEQ ID NO: 504)GTTAA (n)₃₆₋₃₈ TA (SEQ ID NO: 505) GTTAA (n)₄₀₋₄₂ TTTG (SEQ ID NO: 506)GTTAA (n)₃₋₇ GTTAA (n)₃₆₋₃₈ TA (SEQ ID NO: 507)GTTAA (n)₃₋₇ GTTAA (n)₄₀₋₄₂ TTTG (SEQ ID NO: 508)GTTAA (n)₄₄₋₅₀ TA (n)₂ TTTG (SEQ ID NO: 509)GTTAA (n)₃₆₋₃₈ TA (n)₂ TTTG (SEQ ID NO: 510)

The above sequences (SEQ ID NOs: 400-404) are found in SEQ ID NOs: 8,388, 393, 394, 397, and 406-407 (see Table 6, Table 7, and FIG. 20 ).For example see FIG. 20 for an alignment of two identified promotersequences and FIG. 2 (panel d) which depicts results from mutagenesisexperiments throughout a promoter sequence of SEQ ID NO: 8.

In some cases, a promoter of a subject nucleic acid satisfies one ormore of the formulas above (e.g., having X % identity to any of SEQ IDNOs: 400-404) and also has identity with a Bacteroides phage promotersequence set forth herein (for examples, see SEQ ID NOs: 1-8, 151-364,381-388, and 405-407). Thus, in some cases, a promoter of a subjectnucleic acid includes a nucleotide sequence having: (1) X % identity toany of SEQ ID NOs: 400-404; and/or (2) X % identity with a promotersequence set forth herein (see the paragraphs below for examples). Asone illustrative example, in some cases, a promoter of a subject nucleicacid includes a nucleotide sequence having: (1) 80% or more identitywith the sequence set forth in any one of SEQ ID NOs: 400-404; and/or(2) 80% or more identity with the promoter sequence set forth in any ofSEQ ID NOs: 388 and 407. Any combination of the above (X % identity toany of SEQ ID NOs: 400-404) with the below (e.g., X % identity with apromoter sequence set forth herein, e.g., as a substitute for “388 and407” in the previous sentence) is suitable, and any combination can beseparated by an “and/or” as exemplified in this paragraph.

Examples of promoter sequences operable in Bacteroides cells include,but are not limited to those presented in Tables 4-7. For example, insome cases, the promoter of a subject nucleic acid includes a nucleotidesequence having 75% or more identity (e.g., 80% or more, 85% or more,90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8%or more, or 100% identity) with the Bacteroides phage promoter sequenceset forth as SEQ ID NO: 8. In some cases, the promoter includes anucleotide sequence having 90% or more identity (e.g., 92% or more, 95%or more, 98% or more, 99% or more, 99.5% or more, 99.8% or more, or 100%identity) with the Bacteroides phage promoter sequence set forth as SEQID NO: 8.

In some cases, the promoter of a subject nucleic acid includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the Bacteroides phagepromoter sequence set forth as SEQ ID NO: 388. In some cases, thepromoter includes a nucleotide sequence having 90% or more identity(e.g., 92% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the Bacteroides phagepromoter sequence set forth as SEQ ID NO: 388.

In some cases, the promoter of a subject nucleic acid includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the Bacteroides phagepromoter sequence set forth as SEQ ID NO: 407 (or in some cases SEQ IDNO: 406). In some cases, the promoter includes a nucleotide sequencehaving 90% or more identity (e.g., 92% or more, 95% or more, 98% ormore, 99% or more, 99.5% or more, 99.8% or more, or 100% identity) withthe Bacteroides phage promoter sequence set forth as SEQ ID NO: 407 (orin some cases SEQ ID NO: 406).

In some cases, the promoter of a subject nucleic acid includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 388 and 406 (or in some cases SEQ ID NOs:388 and 407). In some cases, the promoter of a subject nucleic acidincludes a nucleotide sequence having 90% or more identity (e.g., 92% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the promoter sequence set forth in any ofSEQ ID NOs: 388 and 406 (or in some cases SEQ ID NOs: 388 and 407). Insome cases, the promoter of a subject nucleic acid includes the promotersequence set forth in any of SEQ ID NOs: 388 and 406 (or in some casesSEQ ID NOs: 388 and 407).

In some cases, the promoter of a subject nucleic acid includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8. In some cases, the promoter of asubject nucleic acid includes a nucleotide sequence having 90% or moreidentity (e.g., 92% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the promotersequence set forth in any of SEQ ID NOs: 1-8. In some cases, thepromoter of a subject nucleic acid includes the promoter sequence setforth in any of SEQ ID NOs: 1-8.

In some cases, the promoter of a subject nucleic acid includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8 and 381-388. In some cases, the promoterof a subject nucleic acid includes a nucleotide sequence having 90% ormore identity (e.g., 92% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the promotersequence set forth in any of SEQ ID NOs: 1-8 and 381-388. In some cases,the promoter of a subject nucleic acid includes the promoter sequenceset forth in any of SEQ ID NOs: 1-8 and 381-388.

In some cases, the promoter of a subject nucleic acid includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8, 151-364, and 381-388. In some cases,the promoter of a subject nucleic acid includes a nucleotide sequencehaving 90% or more identity (e.g., 92% or more, 95% or more, 98% ormore, 99% or more, 99.5% or more, 99.8% or more, or 100% identity) withthe promoter sequence set forth in any of SEQ ID NOs: 1-8, 151-364, and381-388. In some cases, the promoter of a subject nucleic acid includesthe promoter sequence set forth in any of SEQ ID NOs: 1-8, 151-364, and381-388.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter (i.e., the promoter is not a naturally occurring promoter,e.g., the promoter includes a nucleotide sequence having at least onemutation relative to a corresponding wild type promoter). In some cases,the promoter of a subject nucleic acid is a synthetic promoter thatincludes a nucleotide sequence having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, or 99.8% or more) with the Bacteroides phage promotersequence set forth as SEQ ID NO: 8. In some cases, the promoter of asubject nucleic acid is a synthetic promoter that includes a nucleotidesequence having 90% or more identity (e.g., 92% or more, 95% or more,98% or more, 99% or more, 99.5% or more, or 99.8% or more) with theBacteroides phage promoter sequence set forth as SEQ ID NO: 8.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter (i.e., the promoter is not a naturally occurring promoter,e.g., the promoter includes a nucleotide sequence having at least onemutation relative to a corresponding wild type promoter). In some cases,the promoter of a subject nucleic acid is a synthetic promoter thatincludes a nucleotide sequence having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, or 99.8% or more) with the Bacteroides phage promotersequence set forth as SEQ ID NO: 388. In some cases, the promoter of asubject nucleic acid is a synthetic promoter that includes a nucleotidesequence having 90% or more identity (e.g., 92% or more, 95% or more,98% or more, 99% or more, 99.5% or more, or 99.8% or more) with theBacteroides phage promoter sequence set forth as SEQ ID NO: 388.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter (i.e., the promoter is not a naturally occurring promoter,e.g., the promoter includes a nucleotide sequence having at least onemutation relative to a corresponding wild type promoter). In some cases,the promoter of a subject nucleic acid is a synthetic promoter thatincludes a nucleotide sequence having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, or 99.8% or more) with the Bacteroides phage promotersequence set forth as SEQ ID NO: 407 (or in some cases SEQ ID NO: 406).In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 90% or more identity(e.g., 92% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, or 99.8% or more) with the Bacteroides phage promoter sequence setforth as SEQ ID NO: 407 (or in some cases SEQ ID NO: 406).

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, or 99.8% or more) with the promoter sequenceset forth in any of SEQ ID NOs: 1-8. In some cases, the promoter of asubject nucleic acid is a synthetic promoter that includes a nucleotidesequence having 90% or more identity (e.g., 92% or more, 95% or more,98% or more, 99% or more, 99.5% or more, or 99.8% or more) with thepromoter sequence set forth in any of SEQ ID NOs: 1-8.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, or 99.8% or more) with the promoter sequenceset forth in any of SEQ ID NOs: 388 and 406 (or in some cases SEQ IDNOs: 388 and 407). In some cases, the promoter of a subject nucleic acidis a synthetic promoter that includes a nucleotide sequence having 90%or more identity (e.g., 92% or more, 95% or more, 98% or more, 99% ormore, 99.5% or more, or 99.8% or more) with the promoter sequence setforth in any of SEQ ID NOs: 388 and 406 (or in some cases SEQ ID NOs:388 and 407).

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, or 99.8% or more) with the promoter sequenceset forth in any of SEQ ID NOs: 1-8 and 381-388. In some cases, thepromoter of a subject nucleic acid is a synthetic promoter that includesa nucleotide sequence having 90% or more identity (e.g., 92% or more,95% or more, 98% or more, 99% or more, 99.5% or more, or 99.8% or more)with the promoter sequence set forth in any of SEQ ID NOs: 1-8 and381-388.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, or 99.8% or more) with the promoter sequenceset forth in any of SEQ ID NOs: 1-8, 151-364, and 381-388. In somecases, the promoter of a subject nucleic acid is a synthetic promoterthat includes a nucleotide sequence having 90% or more identity (e.g.,92% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, or99.8% or more) with the promoter sequence set forth in any of SEQ IDNOs: 1-8, 151-364, and 381-388.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, 99.8% or more, or 100% identity) with thesynthetic promoter sequence set forth in any of SEQ ID NOs: 1-7. In somecases, the promoter of a subject nucleic acid is a synthetic promoterthat includes a nucleotide sequence having 90% or more identity (e.g.,92% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8%or more, or 100% identity) with the synthetic promoter sequence setforth in any of SEQ ID NOs: 1-7. In some cases, the promoter of asubject nucleic acid is a synthetic promoter that includes thenucleotide sequence set forth in any of SEQ ID NOs: 1-7.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, 99.8% or more, or 100% identity) with thesynthetic promoter sequence set forth in any of SEQ ID NOs: 381-387. Insome cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 90% or more identity(e.g., 92% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the synthetic promotersequence set forth in any of SEQ ID NOs: 381-387. In some cases, thepromoter of a subject nucleic acid is a synthetic promoter that includesthe nucleotide sequence set forth in any of SEQ ID NOs: 381-387.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, 99.8% or more, or 100% identity) with thesynthetic promoter sequence set forth in any of SEQ ID NOs: 1-7 and381-387. In some cases, the promoter of a subject nucleic acid is asynthetic promoter that includes a nucleotide sequence having 90% ormore identity (e.g., 92% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the syntheticpromoter sequence set forth in any of SEQ ID NOs: 1-7 and 381-387. Insome cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes the nucleotide sequence set forth in any of SEQID NOs: 1-7 and 381-387.

In some cases, the promoter of a subject nucleic acid is a syntheticpromoter that includes a nucleotide sequence having 75% or more identity(e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more,99% or more, 99.5% or more, 99.8% or more, or 100% identity) with thesynthetic promoter sequence set forth in any of SEQ ID NOs: 1-7,151-364, and 381-387. In some cases, the promoter of a subject nucleicacid is a synthetic promoter that includes a nucleotide sequence having90% or more identity (e.g., 92% or more, 95% or more, 98% or more, 99%or more, 99.5% or more, 99.8% or more, or 100% identity) with thesynthetic promoter sequence set forth in any of SEQ ID NOs: 1-7,151-364, and 381-387. In some cases, the promoter of a subject nucleicacid is a synthetic promoter that includes the nucleotide sequence setforth in any of SEQ ID NOs: 1-7, 151-364, and 381-387.

TABLE 4Promoters and ribosome binding site (RBS) sequences of the disclosure.SEQ Length ID NO: Description sequence (nt) Promoters 1 sp1caattgggctaccttttttttgtaaaaaaaaaccccgcccctgacagggcgggg 120 (P_BfP3E1)ttttttttttcacttgaactttcaaataatgttcttataaaaccagtgtcgaaa gaaacaaagtag 2sp2 caattgggctaccttttttttgtaaaaaaaaaccccgcccctgacagggcgggg 120(P_BfP2E2) ttttttttttcacttgaactttcaaataatgttcttatatatgcagtgtcgaaagaaacaaagtag 3 sp3caattgggctaccttttttttgttttgtttgcaatggttaatctattgttaaaa 120 (P_BfP2E3)tttaaagtttcacttgaactttcaaataatgttcttatatgtgcagtgtcgaaa gaaacaaagtag 4sp4 caattgggctaccttttttttgttttgtttgcaatggttaatctattgttaaca 120(P_BfP1E4) tttaaagtttcacttgaactttcaaataatgttcttatattttcagtgtcgaaagaaacaaagtag 5 sp5caattgggctaccttttttttgttttgtttgcaatggttaatctattgttaaaa 120 (P_BfP5E4)tttaaagtttcacttgaactttcaaataatgttottctatttgcagtgtcgaaa gaaacaaagtag 6sp6 caattgggctaccttttttttgttttgtttgcaatggttaatctattgttaaaa 120(P_BfP2E5) tttaaagtttcacttgaactttcaaataatgttcttatatttccagtgtcgaaagaaacaaagtag 7 sp7caattgggctaccttttttttgttttgtttgcaatggttaatctattgttgaaa 120 (P_BfP4E5)tttaaagtttcacttgaactttcaaataatgttcttatatttgcagtgtcgaaa gaaacaaagtag 8WT phage caattgggctaccttttttttgttttgtttgcaatggttaatctattgttaaaa 120promoter tttaaagtttcacttgaactttcaaataatgttcttatatttgcagTgtcgaaa (P6)gaaacaaagtag (P_BfP1E6) (−100, +20) 406 WT phagegagtaactacgataataaagtgataattcaatgttaaaacagttaatgcacgtt 114 promoteraaagtatttgctactgagaaatatatccgtatatttgcagcgtagaagttatta (P5) ctaacg(−94, +20) DNA encoding Ribosomal Binding Sequences (RBSs) 10 srgactgatctatggattcaaaaaaatttaaaataatg 36 (synthetic RBS) 11 sr1 (RBS1)gactgatcggcgcgactcacgcgccgatcagtaatg 36 12 sr2 (RBS2)gactgatcgggaggagtaaaaaatattaaaataatg 36 13 sr3 (RBS3)gactgatctctggggtgaataaaatttataataatg 36 14 sr4 (RBS4)gactgatcccccattctattaaattttagaataatg 36 15 sr5 (RBS5)gactgatcggtgttagctttaaatattagaataatg 36 16 sr6 (RBS6)gactgatctagcactcttaaaaaaattaaaataatg 36 17 sr7 (RBS7)gactgatcgtaatctttaaaaaaaataaaaataatg 36 18 sr8 (RBS8)gactgatcgtccatcaatttaaaatttaaaataatg 36 “sp”: synthetic promoter; “sr”:synthetic RBS. SEQ ID NOs: 1-7 are synthetic promoters (mutation of wildtype phage promoter). SEQ ID NO: 8 is a wild type phage promoter. SEQ IDNOs: 10-18 are synthetic RBSs (i.e., include an altered sequencerelative to wild type phage RBSs). SEQ ID NOs: 20-83 are promoter/RBScombinations from promoters of SEQ ID NOs 1-8 paired with RBSs of SEQ IDNOs 11-18 (64 combinations, promoter/RBS, length of each combination is200 nucleotides (nt)). SEQ ID NOs: 28-83 are promoter/RBS combinationsfrom promoters of SEQ ID NOs 2-8 paired with RBSs of SEQ ID NOs 11-18(56 combinations, promoter/RBS, length of each combination is 200nucleotides (nt)). SEQ ID NOs: 151-364 are additional syntheticpromoters. SEQ ID NOs: 381-388 are truncated by 26 nt at the 5′ end and3 nt at the 3′ end and relative to SEQ ID NOs: 1-8, respectively (see−74 +17 of Table 6 and −100 +20 of Table 7).

In some embodiments, a subject nucleic acid includes, upstream (5′) ofthe promoter, a terminator sequence. For example, a terminator sequencelocated upstream can be used to reduce the chance that the operablylinked sequence downstream (3′) of the subject promoter is nottranscribed as part of a transcript from an upstream promoter. In otherwords, a terminator sequence can be positioned 5′ of a subject promoteras an element that can terminate transcription from an upstreampromoter. Any convenient terminator sequence can be used. When presentin the working examples below, the terminator sequencegataaaacgaaaggctcagtcgaaagactgggcctttcgtttta (SEQ ID NO: 409) was used.

In some cases, a subject nucleic acid includes a terminator sequencedownstream (3′) of a subject nucleotide sequence of interest in order toterminate transcription from the subject promoter (the promoter that isoperably linked to the nucleotide sequence of interest).

Ribosomal Binding Site (RBS)

In some embodiments, a subject nucleic acid includes a nucleotidesequence encoding a ribosomal binding site (RBS), e.g., where thesequence encoding the RBS is operably linked to the promoter and ispositioned between the promoter and the nucleotide sequence of interest.As such, in some cases, the RBS is positioned 3′ of the promoter. Insome cases, the RBS is positioned 5′ of the nucleotide sequence ofinterest. In some cases, the RBS is positioned 3′ of the promoter and 5′of the nucleotide sequence of interest.

Examples of nucleotide sequences encoding suitable RBS sequencesinclude, but are not limited to those presented in Table 4. For example,in some cases, the sequence encoding an RBS of a subject nucleic acidincludes a nucleotide sequence having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 10-18. In some cases, thesequence encoding an RBS of a subject nucleic acid includes a nucleotidesequence having 90% or more identity (e.g., 92% or more, 95% or more,98% or more, 99% or more, 99.5% or more, 99.8% or more, or 100%identity) with the nucleotide sequence set forth in any of SEQ ID NOs:10-18. In some cases, the sequence encoding an RBS of a subject nucleicacid includes the nucleotide sequence set forth in any of SEQ ID NOs:10-18.

In some cases, the sequence encoding an RBS of a subject nucleic acidincludes a nucleotide sequence having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 11-18. In some cases, thesequence encoding an RBS of a subject nucleic acid includes a nucleotidesequence having 90% or more identity (e.g., 92% or more, 95% or more,98% or more, 99% or more, 99.5% or more, 99.8% or more, or 100%identity) with the nucleotide sequence set forth in any of SEQ ID NOs:11-18. In some cases, the sequence encoding an RBS of a subject nucleicacid includes the nucleotide sequence set forth in any of SEQ ID NOs:11-18.

In some cases, the RBS of a subject nucleic acid is a synthetic RBS(i.e., includes a mutation relative to a corresponding naturallyoccurring RBS) and the sequence encoding the synthetic RBS includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the nucleotide sequence setforth in any of SEQ ID NOs: 11-18. In some cases, the RBS of a subjectnucleic acid is a synthetic RBS (i.e., includes a mutation relative to acorresponding naturally occurring RBS) and the sequence encoding thesynthetic RBS includes a nucleotide sequence having 90% or more identity(e.g., 92% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the nucleotide sequence setforth in any of SEQ ID NOs: 11-18. In some cases, the sequence encodingan RBS of a subject nucleic acid includes the nucleotide sequence setforth in any of SEQ ID NOs: 11-18.

In some cases, the RBS of a subject nucleic acid is a synthetic RBS(i.e., includes a mutation relative to a corresponding naturallyoccurring RBS) and the sequence encoding the synthetic RBS includes anucleotide sequence having 75% or more identity (e.g., 80% or more, 85%or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the nucleotide sequence setforth in any of SEQ ID NOs: 12-18. In some cases, the RBS of a subjectnucleic acid is a synthetic RBS (i.e., includes a mutation relative to acorresponding naturally occurring RBS) and the sequence encoding thesynthetic RBS includes a nucleotide sequence having 90% or more identity(e.g., 92% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.8% or more, or 100% identity) with the nucleotide sequence setforth in any of SEQ ID NOs: 12-18. In some cases, the sequence encodingan RBS of a subject nucleic acid includes the nucleotide sequence setforth in any of SEQ ID NOs: 12-18.

Promoter/RBS Combinations

Any of the above described promoters can be used in combination with anyof the above described RBSs. For example, in some cases, a subjectnucleic acid includes a promoter of Table 4 and an RBS of Table 4. Insome cases, a subject nucleic acid includes a promoter of Table 4, Table5, Table 6, or Table 7, and an RBS of Table 4.

In some embodiments, a subject nucleic acid includes a promoter thatincludes a nucleotide sequence of a wild type (i.e., naturallyoccurring) promoter from a phage (e.g., a Bacteroides phage, i.e., aphage that infects Bacteroides cells), and an RBS (e.g., a wild typeRBS, a synthetic RBS, an RBS of Table 4, and the like). In some cases, asubject nucleic acid includes a promoter that includes the Bacteroidesphage promoter sequence set forth in any of SEQ ID NOs: 400-404; and anRBS (e.g., a wild type RBS, a synthetic RBS, an RBS of Table 4, and thelike). In some cases, a subject nucleic acid includes a promoter thatincludes the Bacteroides phage promoter sequence set forth in any of SEQID NOs: 8, 388, 406, and 407; and an RBS (e.g., a wild type RBS, asynthetic RBS, an RBS of Table 4, and the like). In some cases, asubject nucleic acid includes a promoter that includes the Bacteroidesphage promoter sequence set forth in any of SEQ ID NOs: 388 and 407; andan RBS (e.g., a wild type RBS, a synthetic RBS, an RBS of Table 4, andthe like). In some cases, a subject nucleic acid includes a promoterthat includes the Bacteroides phage promoter sequence set forth in SEQID NO: 8; and an RBS (e.g., a wild type RBS, a synthetic RBS, an RBS ofTable 4, and the like). In some cases, a subject nucleic acid includes apromoter that includes the Bacteroides phage promoter sequence set forthin SEQ ID NO: 406; and an RBS (e.g., a wild type RBS, a synthetic RBS,an RBS of Table 4, and the like). In some cases, a promoter of a subjectnucleic acid is a synthetic promoter (i.e., not naturally occurring,e.g., a sequence that has at least one mutation relative to acorresponding wild type promoter sequence); and an RBS (e.g., a wildtype RBS, a synthetic RBS, an RBS of Table 4, and the like).

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 2-8; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 10-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 2-8; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 2-8; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 11-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 2-8; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 2-8; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 12-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 2-8; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 10-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 1-8; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 11-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 1-8; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 12-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 1-8; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 381-388; and a nucleotide sequence, encodingan RBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90%or more, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 10-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 381-388; and a nucleotide sequence,encoding an RBS, with the nucleotide sequence set forth in any of SEQ IDNOs: 10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 381-388; and a nucleotide sequence, encodingan RBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90%or more, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 11-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 381-388; and a nucleotide sequence,encoding an RBS, with the nucleotide sequence set forth in any of SEQ IDNOs: 11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 381-388; and a nucleotide sequence, encodingan RBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90%or more, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 12-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 381-388; and a nucleotide sequence,encoding an RBS, with the nucleotide sequence set forth in any of SEQ IDNOs: 12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 388 and 407 (or in some cases SEQ ID NOs:388 and 406); and a nucleotide sequence, encoding an RBS, having 75% ormore identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more,98% or more, 99% or more, 99.5% or more, 99.8% or more, or 100%identity) with the nucleotide sequence set forth in any of SEQ ID NOs:10-18. In some cases, a subject nucleic acid includes a promoter thatincludes a nucleotide sequence with the promoter sequence set forth inany of SEQ ID NOs: 388 and 407 (or in some cases SEQ ID NOs: 388 and406); and a nucleotide sequence, encoding an RBS, with the nucleotidesequence set forth in any of SEQ ID NOs: 10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 388 and 407 (or in some cases SEQ ID NOs:388 and 406); and a nucleotide sequence, encoding an RBS, having 75% ormore identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more,98% or more, 99% or more, 99.5% or more, 99.8% or more, or 100%identity) with the nucleotide sequence set forth in any of SEQ ID NOs:11-18. In some cases, a subject nucleic acid includes a promoter thatincludes a nucleotide sequence with the promoter sequence set forth inany of SEQ ID NOs: 388 and 407 (or in some cases SEQ ID NOs: 388 and406); and a nucleotide sequence, encoding an RBS, with the nucleotidesequence set forth in any of SEQ ID NOs: 11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 388 and 407 (or in some cases SEQ ID NOs:388 and 406); and a nucleotide sequence, encoding an RBS, having 75% ormore identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more,98% or more, 99% or more, 99.5% or more, 99.8% or more, or 100%identity) with the nucleotide sequence set forth in any of SEQ ID NOs:12-18. In some cases, a subject nucleic acid includes a promoter thatincludes a nucleotide sequence with the promoter sequence set forth inany of SEQ ID NOs: 388 and 407 (or in some cases SEQ ID NOs: 388 and406); and a nucleotide sequence, encoding an RBS, with the nucleotidesequence set forth in any of SEQ ID NOs: 12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 10-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 1-7; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 11-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 1-7; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7; and a nucleotide sequence, encoding anRBS, having 75% or more identity (e.g., 80% or more, 85% or more, 90% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, 99.8% ormore, or 100% identity) with the nucleotide sequence set forth in any ofSEQ ID NOs: 12-18. In some cases, a subject nucleic acid includes apromoter that includes a nucleotide sequence with the promoter sequenceset forth in any of SEQ ID NOs: 1-7; and a nucleotide sequence, encodingan RBS, with the nucleotide sequence set forth in any of SEQ ID NOs:12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7 and 381-387; and a nucleotide sequence,encoding an RBS, having 75% or more identity (e.g., 80% or more, 85% ormore, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more,99.8% or more, or 100% identity) with the nucleotide sequence set forthin any of SEQ ID NOs: 10-18. In some cases, a subject nucleic acidincludes a promoter that includes a nucleotide sequence with thepromoter sequence set forth in any of SEQ ID NOs: 1-7 and 381-387; and anucleotide sequence, encoding an RBS, with the nucleotide sequence setforth in any of SEQ ID NOs: 10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7 and 381-387; and a nucleotide sequence,encoding an RBS, having 75% or more identity (e.g., 80% or more, 85% ormore, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more,99.8% or more, or 100% identity) with the nucleotide sequence set forthin any of SEQ ID NOs: 11-18. In some cases, a subject nucleic acidincludes a promoter that includes a nucleotide sequence with thepromoter sequence set forth in any of SEQ ID NOs: 1-7 and 381-387; and anucleotide sequence, encoding an RBS, with the nucleotide sequence setforth in any of SEQ ID NOs: 11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7 and 381-387; and a nucleotide sequence,encoding an RBS, having 75% or more identity (e.g., 80% or more, 85% ormore, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more,99.8% or more, or 100% identity) with the nucleotide sequence set forthin any of SEQ ID NOs: 12-18. In some cases, a subject nucleic acidincludes a promoter that includes a nucleotide sequence with thepromoter sequence set forth in any of SEQ ID NOs: 1-7 and 381-387; and anucleotide sequence, encoding an RBS, with the nucleotide sequence setforth in any of SEQ ID NOs: 12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 2-8, 151-364, and 382-388; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 10-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 2-8, 151-364,and 382-388; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 2-8, 151-364, and 382-388; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 11-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 2-8, 151-364,and 382-388; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 2-8, 151-364, and 382-388; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 12-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 2-8, 151-364,and 382-388; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7, 151-364, and 381-387; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 10-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 1-7, 151-364,and 381-387; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7, 151-364, and 381-387; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 11-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 1-7, 151-364,and 381-387; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-7, 151-364, and 381-387; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 12-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 1-7, 151-364,and 381-387; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 12-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8, 151-364, and 381-388; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 10-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 1-8, 151-364,and 381-388; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 10-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8, 151-364, and 381-388; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 11-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 1-8, 151-364,and 381-388; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 11-18.

In some cases, a subject nucleic acid includes a promoter that includesa nucleotide sequence having 75% or more identity (e.g., 80% or more,85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5%or more, 99.8% or more, or 100% identity) with the promoter sequence setforth in any of SEQ ID NOs: 1-8, 151-364, and 381-388; and a nucleotidesequence, encoding an RBS, having 75% or more identity (e.g., 80% ormore, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more,99.5% or more, 99.8% or more, or 100% identity) with the nucleotidesequence set forth in any of SEQ ID NOs: 12-18. In some cases, a subjectnucleic acid includes a promoter that includes a nucleotide sequencewith the promoter sequence set forth in any of SEQ ID NOs: 1-8, 151-364,and 381-388; and a nucleotide sequence, encoding an RBS, with thenucleotide sequence set forth in any of SEQ ID NOs: 12-18.

In some cases, a subject nucleic acid includes a nucleotide sequencehaving the sequence of the promoter/RBS combination set forth in any ofSEQ ID NOs: 20-83. In some cases, a subject nucleic acid includes anucleotide sequence having the sequence of the promoter/RBS combinationset forth in any of SEQ ID NOs: 28-83.

Nucleotide Sequence of Interest

As noted above, provided are nucleic acids (e.g., expression vectors)that include a promoter sequence operably linked to a nucleotidesequence of interest. A nucleotide sequence of interest of a subjectnucleic acid is operably linked to a promoter. The terms “operablylinked” and “operable linkage” as used herein refer to a juxtapositionwherein the components so described are in a relationship permittingthem to function in their intended manner. For instance, a promoter isoperably linked to a nucleotide sequence if the promoter affects thetranscription and/or expression of the nucleotide sequence. As anotherexample, a ribosomal binding site (RBS) (e.g., a Shine Dalgarnosequence, a synthetic RBS, and the like) is a site in an mRNA thatfacilitates the translation of the mRNA into protein. Thus, a subjectnucleotide sequence of interest (e.g. one encoding an mRNA, i.e., oneencoding a protein) is operably linked to a sequence encoding an RBS if,once transcribed into RNA, the RBS affects the translation of thetranscribed nucleotide sequence of interest. Therefore, a sequenceencoding an RBS can be operably linked to both a promoter and anucleotide sequence of interest if the nucleotide sequence of interestis also operably linked to the same promoter. In other words, a promotercan be operably linked to both a sequence encoding an RBS and to anucleotide sequence of interest, the sequence encoding the RBS can beoperably linked to both the promoter and the nucleotide sequence ofinterest, and the nucleotide sequence of interest can be operably linkedto both the promoter and the sequence encoding the RBS.

As used herein, for the purposes of this disclosure, it is equivalent tosay that a ‘nucleotide sequence is operably linked to a promoter’ and tosay that the ‘promoter is operably linked to the nucleotide sequence’(or to say that the two are in operable linkage with one another).Likewise, it is equivalent to say that a ‘nucleotide sequence isoperably linked to a sequence encoding an RBS’ and to say that the‘sequence encoding an RBS is operably linked to the nucleotide sequence’(or to say that the two sequences are in operable linkage with oneanother).

A nucleotide sequence of interest can be any nucleotide sequence as longas the sequence is heterologous to the promoter to which it is operablylinked. The term “heterologous,” e.g., with respect to a heterologousnucleotide sequence, is a relative term referring to a nucleotidesequence (e.g., a nucleotide sequence of interest) that is related toanother nucleotide sequence (e.g., a promoter) in a manner so that thetwo sequences are not arranged in the same relationship to each other asin nature. Heterologous nucleotide sequences include, e.g., aheterologous nucleotide sequence operably linked to a promoter, and anucleic including a native promoter that is inserted into a heterologousvector (e.g., for introduction into a cell). Two heterologous nucleotidesequences (e.g., a nucleotide sequence operably linked to a promoter)can originate from different sources (e.g., one from a phage and onefrom a cell) or from the same source (e.g., both from a phage or bothfrom a cell). Thus, when a subject nucleotide sequence of interest isheterologous to the promoter to which it is operably linked, thenucleotide sequence of interest is a sequence that is not found innature in operable linkage with the promoter. In other words, thecombination of promoter and nucleotide sequence of interest of a subjectnucleic acid is a combination that is not naturally occurring.

Transgenes

Examples of nucleotide sequences of interest include but are not limitedto transgene sequences and insertion sites. For example, in some cases,a nucleotide sequence of interest is a transgene (e.g., a transgene thatencodes a protein, a transgene that encodes a non-coding RNA, atransgene that encodes a coding RNA, i.e., an mRNA). As used herein, theterm “transgene” can be used to refer to a nucleotide sequence ofinterest that (i) is operably linked to a promoter (e.g., a promoterfunctional in prokaryotic cells, e.g., Bacteroides cells), (ii) encodesan expression product (e.g., protein, mRNA, non-coding RNA), and (iii)is capable of being expressed in a target cell (e.g., a prokaryotic cellsuch as a Bacteroides cell). Non-limiting examples of transgenes includenucleotide sequences that encode a peptide or polypeptide (i.e., proteincoding sequences, mRNA sequences), and nucleotide sequences that encodenon-translated RNAs (non-coding RNA, ncRNA) (e.g., a guide RNA for agenome editing protein such as a CRISPR/Cas protein like Cas9; an RNAsuch as antisense RNA, siRNA, shRNA, and miRNA; and the like). In somecases, a transgene is operably linked to a promoter functional inprokaryotic cells (e.g., Bacteroides cells).

In some cases, a transgene is a “marker” or “marker gene” or “markerprotein.” A marker is an expression product (e.g., mRNA, protein,non-coding RNA) that marks a host cell such that the host cell isdetectable (e.g., detectably labeled). In some cases, the host cell isdetectable by virtue of survival (e.g., the marker can be a “selectablemarker”). In some cases, the host cell is detectable by observation(e.g., by direct visualization, by performing an assay, by performing ameasurement step, and the like) and the marker can be referred to as a“reporter” or “reporter gene” or “reporter protein.”

As noted above, some markers are “selectable markers.” Selectablemarkers (a “selectable marker gene” can encode a “selectable markerprotein”) provide for selection, i.e., for selective retention of cells(e.g., prokaryotic cells) that comprise the selectable marker gene,during culturing and propagation of the cells. An example of aselectable marker is a transgene that encodes a drug selectable markerprotein that provides drug resistance for prokaryotic cells (e.g.,Bacteroides cells). Such a selectable marker encodes a drug selectablemarker protein that provides resistance for prokaryotic cells to one ormore drugs (e.g., kanamycin, neomycin, ampicillin, carbenicillin,chloramphenicol, gentamicin, tetracycline, rifampin, trimethoprim,hygromycin B, spectinomycin, and the like). Proteins that provide drugresistance to cells (e.g., prokaryotic cells) in which they areexpressed are known in the art. For example, wild type genes/proteinsare known that provide resistance (e.g., for prokaryotic cells) to theabove drugs. For example, aminoglycoside 3′-phosphotransferase (APH), isa wild type protein that provides for resistance to the drugs Kanamycin,Neomycin and Geneticin (G418); while beta-lactamase is a wild typeprotein that provides for resistance to the drugs ampicillin andcarbenecillin. Chloramphenicol acetyltransferase (cat) confersresistance to chloramphenicol. Genes conferring resistance toaminoglycosides include aac, aad, aph and strA/B. Genes conferringresistance to β-lactams include ampC, cmy, tem and vim. Genes conferringresistance to sulfonamides include suII and suIII. Genes conferringresistance to tetracycline include tet(A), tet(B), tet(C), tet(D) andregulator, and tetR. Selectable markers can also be those useful inbalanced lethal systems, e.g., in which an essential gene is maintainedon a plasmid with a corresponding chromosomal deletion or suppressiblemutation on the host cell genome, e.g. a tRNA selectable marker thatsuppresses a host chromosomal gene mutation; those useful in repressortitration systems, in which an operator sequences, e.g. the lac operatoror tet operator, placed on a plasmid, derepresses a chromosomal gene;antidote/poison selection schemes, in which an antidote to a poisonexpressed from the host chromosome (e.g. the ccdB gene) is maintained onthe plasmid; and those useful in RNA-based selection schemes, e.g.antisense regulators, or antisense regulators that inhibit thetranslation of a gene transcribed from the host chromosome that wouldotherwise promote cell death.

Also as noted above, some markers are “reporters” or “reporter genes” or“reporter proteins.” A “reporter” is a marker that provides anidentifiable characteristic (trait) to a cell that expresses thereporter such that the cell can be identified relative to cells notexpressing the reporter. A reporter is detectable by observation (e.g.,by direct visualization, by performing an assay, by performing ameasurement step, and the like). For example, a fluorescent protein suchas GFP (green fluorescent protein) can be considered a reporter becausethose cells that express the gene encoding GFP can be readily identifiedrelative to those cells not expressing GFP. Likewise, an enzyme such asluciferase can be considered a reporter because those cells that expressthe gene encoding luciferase can be readily identified relative to thosecells not expressing luciferase (e.g., by performing an assay in which asubstrate for luciferase is converted by luciferase into a detectableproduct).

In some cases, a transgene is an enzyme (e.g., a metabolic enzyme). Forexample, there are many small molecules produced by microbes in the gutthat accumulate in the blood and cause or exacerbate diseases.Expressing an enzyme or a pathway (as a transgene) in a Bacteroides cell(or population of cells) to break down these products can be used inmethods of treatment. For example, a Bacteroides cell expressing such atransgene can be introduced into the gut of an individual (e.g., inorder to break down small molecules produced by microbes to reduce oreven eliminate the amount absorbed by the gut of the individual,reducing the accumulation of the molecules in the blood of theindividual). As an illustration that this is achievable (e.g., as proofof principle) see, e.g., FIG. 2 e-2 f and FIG. 3 a of the workingexamples below, in which luciferase (an enzyme) was expressed andfunctional in Bacteroides that were introduced into the gut of ananimal.

Secreted Fusion Proteins

In some embodiments, a transgene encodes a secreted protein (e.g., atherapeutic protein). For example, in some cases, a transgene is asecreted fusion protein that includes a polypeptide of interest and asecreted Bacteroides polypeptide (or secreted variant and/or fragmentthereof). As used herein, the term “secreted” when referring to aprotein product of a subject transgene, encompasses any route of beingadded into the extracellular environment. For example, in some cases, asubject polypeptide of interest is secreted by virtue of being fused toa secreted Bacteroides protein (e.g., BT0525)(e.g., see FIG. 23 ) thatis secreted through the outer membrane. However, in some cases, asubject polypeptide of interest is secreted by virtue of being fused toa secreted Bacteroides protein (e.g., BT1488, SEQ ID NO: 484) that isreleased from outer membrane vesicles (see, e.g., Elhenawy et al, MBio.2014 Mar. 11; 5(2); Hickey et al, Cell Host Microbe. 2015 May 13;17(5):672-80; and Shen et al, Cell Host Microbe. 2012 Oct. 18;12(4):509-20). For example, in some cases, the outer-membrane buds offinto small vesicles containing protein. Proteins secreted this way wouldbe protected from degradation by gut proteases, and could also bedelivered to the mammalian cell cytoplasm when those vesicles fuse tothe cell membrane. Thus, in some cases the fusion protein is secretedthrough the outer membrane (e.g. when fused to BT0525), and in somecases the fusion protein is released from outer-membrane vesicles (e.g.when fused to BT1488), e.g., see FIG. 22 .

The sequence of BT1488 is:

(SEQ ID NO: 484) MAIAATLLASCNKDEEETEIQGFKVLEYRPAPGQFINEGFDCQTMEEANAYAEERFNKKLYVSLGSFGGYITVKMPKEIKNRKGYDFGIIGNPFSGSSEPGIVWVSEDANGNGKADDVWYELKGSDEPERDYSVTYHRPDAAGDIPWEDNKGESGIIKYLPQYHDQMYYPNWIKEDSYTLKGSMLEARTEQEGGIWKNKDFGKGYADNWGSDMAKDDNGNYRYNQFDLDDAVDQNGNPVTLERIHFVKVQSAILKNVESIGEVSTEVVGFKAFThe term “secreted Bacteroides polypeptide” as used herein is meant toencompass any type of secretion, including those described in thisparagraph.

As described in the examples section below, proteins were identifiedthat are secreted by Bacteroides cells (e.g., see FIG. 17 a-17 e andFIG. 18 a-18 b ). Provided herein are fusion proteins in which apolypeptide of interest is fused to a secreted Bacteroides protein or toa secreted variant (e.g., fragment) thereof. Examples of secretedBacteroides proteins include but are not limited to those presented inFIG. 19 (SEQ ID NOs: 458-484). This list includes a set of proteins thatwere identified using proteomic data to be significantlyover-represented in the supernatant (i.e. secreted) compared to whatwould be expected from a cell pellet (i.e. not-secreted) from B.thetaiotaomicron cultures. (e.g., see FIG. 17 a ). As such, apolypeptide of interest can be fused to any one of the proteins setforth in SEQ ID NOs: 458-484, or to a secreted variant (e.g., sequencevariant, fragment, etc.) thereof. While the amino acid sequences setforth in SEQ ID NOs: 458-484 are full length protein sequences, one ofordinary skill in the art using routine and conventional techniqueswould be readily able to identify fragments and/or variants thereof thatare also secreted. As would be recognized by one of ordinary skill inthe art, because the purpose of fusing a polypeptide of interest (e.g.,by fusing a nucleotide sequence encoding a polypeptide of interest to anucleotide sequence encoding a secreted Bacteroides protein) to asecreted Bacteroides protein is to use the secreted Bacteroides proteinas a carrier to deliver the polypeptide of interest into theextracellular space, the exact sequence and/or length of the secretedBacteroides protein (or fragment thereof) is not crucial.

Thus, when using the terms “secreted Bacteroides protein” and “secretedBacteroides polypeptide” herein, it is meant any protein (e.g.,including any full length protein, variant, and/or fragment thereof)that is secreted by a Bacteroides cell into the extracellular space(e.g., via outer membrane vesicle release, via secretion across theouter membrane). As such, the terms encompass fusion proteins thatinclude the entire full length sequence of a secreted Bacteroidesprotein (e.g., a naturally secreted Bacteroides protein), but alsoencompass fusion proteins that include secreted variants and/or secretedfragments of a secreted Bacteroides protein.

As noted above, in some cases, the secreted Bacteroides protein of asubject fusion protein includes the amino acid sequence set forth in anyone of SEQ ID NOs: 458-484, or is a secreted variant and/or fragmentthereof. In some cases, the secreted Bacteroides protein of a subjectfusion protein is BT0525 (SEQ ID NO: 459) (or a secreted variant and/orsecreted fragment thereof). Thus, in some cases, the secretedBacteroides protein of a subject fusion protein is a secreted variantand/or a secreted fragment of BT0525 (SEQ ID NO: 459). In some cases thefusion protein is secreted through the outer membrane (e.g. when fusedto BT0525). In some cases the fusion protein is released fromouter-membrane vesicles (e.g. when fused to BT1488, SEQ ID NO: 484)(e.g., see FIG. 22 ).

In some cases, a secreted Bacteroides protein of a subject secretedfusion protein has an amino acid sequence having 80% or more (85% ormore, 90% or more, 92% or more, 95% or more, 98% or more, 99% or more,99.5% or more, or 100%) sequence identity with the amino acid sequenceset forth in any of SEQ ID NOs: 458-484. In some cases, a secretedBacteroides protein of a subject secreted fusion protein has an aminoacid sequence having 80% or more (85% or more, 90% or more, 92% or more,95% or more, 98% or more, 99% or more, 99.5% or more, or 100%) sequenceidentity with the amino acid sequence set forth in any of SEQ ID NOs:458-484 over a stretch of 20 or more amino acids. In some cases, asecreted Bacteroides protein of a subject secreted fusion protein has anamino acid sequence having 80% or more (85% or more, 90% or more, 92% ormore, 95% or more, 98% or more, 99% or more, 99.5% or more, or 100%)sequence identity with the amino acid sequence set forth in SEQ ID NOs:459. In some cases, a secreted Bacteroides protein of a subject secretedfusion protein has an amino acid sequence having 80% or more (85% ormore, 90% or more, 92% or more, 95% or more, 98% or more, 99% or more,99.5% or more, or 100%) sequence identity with the amino acid sequenceset forth in SEQ ID NOs: 459 over a stretch of 20 or more amino acids.

The polypeptide of interest of a subject secreted fusion protein can beany polypeptide. In some cases, the polypeptide of interest is atherapeutic peptide (e.g., a metabolic enzyme or a peptide that can,when secreted from a Bacteroides cell, e.g., in the gut of anindividual, have a positive impact on a clinical parameter of theindividual). For example, see below for methods of delivering and formethods of treating. Examples of therapeutic peptides include but arenot limited to metabolic enzymes (e.g., as discussed elsewhere herein)and anti-inflammatory peptides, which can include but are not limited tothose presented in Table 8 (SEQ ID NOs: 411-417). In some cases, thepolypeptide of interest includes an amino acid sequence selected from:RYTVELA (SEQ ID NO: 411)(Peptide 101.10), VTLVGNTFLQSTINRTIGVL (SEQ IDNO: 412)(Fp MAM-pep5), and MQPPGC (SEQ ID NO: 413)(CD80-CAP1). In somecases, the polypeptide of interest includes an amino acid sequenceselected from: RYTVELA (SEQ ID NO: 411)(Peptide 101.10), andVTLVGNTFLQSTINRTIGVL (SEQ ID NO: 412)(Fp MAM-pep5). In some cases, thepolypeptide of interest includes the amino acid sequence RYTVELA (SEQ IDNO: 411)(Peptide 101.10). In some cases, the polypeptide of interestincludes the amino acid sequence VTLVGNTFLQSTINRTIGVL (SEQ ID NO:412)(Fp MAM-pep5).

TABLE 8 Examples of therapeutic peptides (polypeptidesof interested) that can be fused to a secretedBacteroides protein to form a subject secreted fusion protein. SEQPeptide AA sequence ID NO Type 101.10 RYTVELA 411 IL-1 inhibitorypeptides Fp MAM- VTLVGNTFLQS 412 anti-NF-κB pep5 TINRTIGVL CD80-CAP1MQPPGC 413 CD80 antagonistoc peptide Pep2305 TEEEQQLY 414IL-23 inhibitory peptides KPV KPV 415 NF-kB and MAPK inhibition WP9QYYCWSQYLCY 416 anti-TNF P144 TSLDASIIWAM 417 TGF-b inhibitory MQN peptide

In some embodiments, a subject secreted fusion protein includes morethan one polypeptide of interest (e.g., two or more, three or more, orfour or more polypeptides of interest). In some such cases, thepolypeptides of interest can be separated by linkers (e.g., cleavablelinkers).

A subject polypeptide of interest of a fusion protein can have anydesirable length. For example, in the case of a secreted fusion protein,the polypeptide of interest can have any desirable length as long as thepolypeptide of interest is secreted from the cell (e.g., secreted aspart of the fusion protein and in some cases separated from the fusionafter secretion via cleavage of a linker, secreted by the cell aftercleavage of a cleavable linker, and the like). In some embodiments, apolypeptide of interest has a length of 2 amino acids or more (e.g., 3amino acids or more, 5 amino acids or more, 6 amino acids or more, 7amino acids or more, or 10 amino acids or more). In some cases, apolypeptide of interest has a length in a range of from 2 to 1000 aminoacids (e.g., 2 to 500, 2 to 300, 2 to 200, 2 to 100, 2 to 75, 2 to 50, 2to 30, 2 to 25, 2 to 20, 3 to 1000, 3 to 500, 3 to 300, 3 to 200, 3 to100, 3 to 75, 3 to 50, 3 to 30, 3 to 25, 3 to 20, 5 to 1000, 5 to 500, 5to 300, 5 to 200, 5 to 100, 5 to 75, 5 to 50, 5 to 30, 5 to 25, or 5 to20 amino acids). In some cases, a polypeptide of interest has a lengthin a range of from 3 to 50 amino acids (e.g., 3 to 30, 3 to 25, 3 to 20,5 to 50, 5 to 30, 5 to 25, or 5 to 20 amino acids). In some cases, apolypeptide of interest has a length in a range of from 6 to 40 aminoacids (e.g., 6 to 30, 6 to 25, 6 to 20, 7 to 40, 7 to 30, 7 to 25, or 7to 20 amino acids).

In some cases, the polypeptide of interest (e.g., therapeutic peptide)of a subject secreted fusion protein is fused to a secreted Bacteroidesprotein (or secreted variant and/or fragment thereof) via a linker(i.e., a linker is positioned between the secreted Bacteroides proteinand the polypeptide of interest). Thus, in some cases, a subject fusionprotein includes a linker and a secreted Bacteroides protein fused to aheterologous polypeptide of interest, where the linker is positionedbetween the secreted Bacteroides protein and the polypeptide ofinterest. In some cases, the linker is a cleavable linker. In somecases, a cleavable linker is a self-cleaving linker (e.g., a 2A peptide,an intein, etc.). In some such cases a cleavable linker is cleavable byone or more gut proteases. When a subject secreted fusion proteinincludes a polypeptide of interest (e.g., therapeutic peptide) fused toa secreted Bacteroides protein (or secreted variant and/or fragmentthereof) via a linker that is cleavable by one or more gut proteases,the polypeptide of interest will be released from the secretedBacteroides protein only after secretion and only when the extracellularenvironment (e.g., an animal gut) includes an appropriate correspondingprotease.

In some cases, a cleavable linker is cleavable by one or more host cellproteases (e.g., proteases of a Bacteroides cell or proteases of a cellof the host animal's gut) (e.g., an extracellular protease such as amatrix metalloproteinase, or an endopeptidase-2; an intracellularprotease such as a cysteine protease or a serine protease; and thelike). As an illustrative example, a subject polypeptide of interest canbe fused to a secreted Bacteroides protein such that the fusion proteinis incorporated into outer membrane vesicles (OMVs) that are releasedfrom the Bacteroides cell and then fuse with a host animal's cell, thusdelivering the polypeptide of interest into the cytoplasm of a hostanimal's cell. In this scenario, a cleavable linker can be cleavable bya eukaryotic cytoplasmic protease. When a subject secreted fusionprotein includes a polypeptide of interest (e.g., therapeutic peptide)fused to a secreted Bacteroides protein (or secreted variant and/orfragment thereof) via a linker that is cleavable by one or more hostcell proteases (e.g., an extracellular and/or intracellular host cellprotease), the polypeptide of interest will be released from thesecreted Bacteroides protein only after secretion and only when theenvironment (e.g., animal cell's cytoplasm) includes an appropriatecorresponding protease.

Any convenient cleavable linker can be used and may ‘target’ gutproteases (and their corresponding cleavable linker sequences) will beknown to one of ordinary skill in the art. Examples of gut proteasesinclude but are not limited to those listed in Table 9. Thus, in somecases, a cleavable linker of a subject secreted fusion protein iscleavable by one or more gut proteases (also referred to herein astarget peptidases) selected from: a trypsin, a chymotrypsin, and anelastase. In some cases, a cleavable linker of a subject secreted fusionprotein is cleavable by one or more gut proteases selected from:chymotrypsin-like elastase family member 2A, anionic trypsin-2,chymotrypsin-C, chymotrypsinogen B, elastase 1, and elastase 3. In somecases, a cleavable linker of a subject secreted fusion protein iscleavable by one or more gut proteases selected from: trypsin,chymotrypsin (e.g., chymotrypsin B), and elastase (e.g., elastase 1,elastase 3). In some cases, a cleavable linker of a subject secretedfusion protein is cleavable by one or more gut proteases selected fromtrypsin, chymotrypsin, chymotrypsin B, and elastase (e.g., elastase 1,elastase 3).

TABLE 9 Gut Enzymes and cleavage preferences Target gut proteasesUniprot: Preferential cleavage Chymotrypsin-like Leu (L), Met (M) andPhe (F) elastase family member 2A Anionic trypsin-2 Arg (R), Lys (K).Chymotrypsin-C Leu (L), Tyr (Y), Phe (F), Met (M), Trp (W), Gln (Q), Asn(N). Chymotrypsinogen B Tyr (Y), Trp (W), Phe (F), Leu (L) Elastase 1Ala (A) Elastase 3 Ala (A)

A linker (e.g., cleavable linker) can have any convenient length. Insome cases, a linker is 2 or more amino acids in length (e.g.,) In someembodiments, a linker (e.g., cleavable linker) has a length of 2 aminoacids or more (e.g., 3 amino acids or more, 5 amino acids or more, 6amino acids or more, 7 amino acids or more, or 10 amino acids or more).In some cases, a linker (e.g., cleavable linker) has a length in a rangeof from 2 to 50 amino acids (e.g., 2 to 30, 2 to 25, 2 to 20, 2 to 15, 2to 10, 2 to 8, 3 to 50, 3 to 30, 3 to 25, 3 to 20, 3 to 15, 3 to 10, 3to 8, 5 to 50, 5 to 30, 5 to 25, or 5 to 20, 5 to 15, 5 to 10, 5 to 8, 8to 50, 8 to 30, 8 to 25, or 8 to 20, 8 to 15, or 8 to 10 amino acids).In some cases, a linker (e.g., cleavable linker) has a length in a rangeof from 4 to 20 amino acids (e.g., 5 to 20, 5 to 15, 5 to 10, 5 to 8, 8to 20, 8 to 15, or 8 to 10 amino acids).

A cleavable linker can include one or more (e.g., 2 or more, 3 or more,4 or more, or 5 or more) non-cleavable amino acids followed by acleavable amino acid. In some cases, a cleavable linker includes in arange of from 2 to 50 non-cleavable amino acids (e.g., 2 to 25, 2 to 20,2 to 15, 2 to 10, 2 to 8, 2 to 5, 5 to 50, 5 to 25, 5 to 20, 5 to 15, 5to 10, or 5 to 8 non-cleavable amino acids) followed by a cleavableamino acid. In some cases, a cleavable linker includes in a range offrom 2 to 10 non-cleavable amino acids (e.g., 2 to 8, 2 to 5, 5 to 10,or 5 to 8 non-cleavable amino acids) followed by a cleavable amino acid.

In some cases the one or more (e.g., 2 or more, 3 or more, 4 or more, or5 or more) non-cleavable amino acids are selected from S, G, T, P, M, H,A, D, E, N, and V. In some cases the one or more (e.g., 2 or more, 3 ormore, 4 or more, or 5 or more) non-cleavable amino acids are selectedfrom S, G, T, P, M, H, and A. In some cases the one or more (e.g., 2 ormore, 3 or more, 4 or more, or 5 or more) non-cleavable amino acids areselected from S, G, T, P, and A.

In some cases, the cleavable amino acid is selected from R, L, F, A, K,M, W, Q, Y, and L. In some cases, the cleavable amino acid is selectedfrom R, L, F, and A.

In some cases, a cleavable linker includes one or more (e.g., 2 or more,3 or more, 4 or more, or 5 or more) non-cleavable amino acids selectedfrom S, G, T, P, M, H, A, D, E, N, and V followed by a cleavable aminoacid selected from: R, L, F, A, K, M, W, Q, Y, and L. In some cases, acleavable linker includes one or more (e.g., 2 or more, 3 or more, 4 ormore, or 5 or more) non-cleavable amino acids selected from S, G, T, P,M, H, and A followed by a cleavable amino acid selected from: R, L, F,A, K, M, W, Q, Y, and L. In some cases, a cleavable linker includes oneor more (e.g., 2 or more, 3 or more, 4 or more, or 5 or more)non-cleavable amino acids selected from S, G, T, P, and A followed by acleavable amino acid selected from: R, L, F, A, K, M, W, Q, Y, and L. Insome cases, a cleavable linker includes one or more (e.g., 2 or more, 3or more, 4 or more, or 5 or more) non-cleavable amino acids selectedfrom S, G, T, P, M, H, A, D, E, N, and V followed by a cleavable aminoacid selected from: R, L, F, and A. In some cases, a cleavable linkerincludes one or more (e.g., 2 or more, 3 or more, 4 or more, or 5 ormore) non-cleavable amino acids selected from S, G, T, P, M, H, and Afollowed by a cleavable amino acid selected from: R, L, F, and A. Insome cases, a cleavable linker includes one or more (e.g., 2 or more, 3or more, 4 or more, or 5 or more) non-cleavable amino acids selectedfrom S, G, T, P, and A followed by a cleavable amino acid selected from:R, L, F, and A. In some cases, a cleavable linker includes one or more(e.g., 2 or more, 3 or more, 4 or more, or 5 or more) non-cleavableamino acids selected from S, G, T, P, and A followed by a P followed bya cleavable amino acid selected from: R, L, F, and A (e.g., followed bya P followed by an F).

Motifs for various gut proteases are known in the art. For example, amotif for Chymotrypsin is A; followed by A; followed by a P or a V;followed by an F, Y, L, or W. Examples of suitable cleavable linkersinclude, but are not limited to those presented in Table 11. Additionalexamples of suitable cleavable linkers include, but are not limited to,those presented in Table 11. Additional examples of suitable cleavablelinkers include, but are not limited to, those that include one or more(e.g., 2 or more, 3 or more, 4 or more, or 5 or more) non-cleavableamino acids (e.g., selected from S, G, T, P, and A) followed by any oneof the sequences set forth in SEQ ID NOs: 427-453. In some cases, acleavable linker includes an amino acid sequence selected from thesequences set forth in SEQ ID NOs: 427-453. In some cases, a cleavablelinker includes the amino acid sequence TAPF (SEQ ID NO: 433).

TABLE 10 Examples of cleavable linker sequencesand their target peptidase Amino acid sequence (cleavage SEQ at boldTarget ID Linkers amino acid) peptidase NO: CL1 SGPTGHGR Trypsin 422 CL2SGPTGMAR Trypsin 423 CL3 SGPTASPL Chymotrypsin 424 CL4 SGPTTAPFChymotrypsin B 425 CL5 SGPTAAPA Elastase 1 426

TABLE 11 Examples of cleavable linker sequences. Linker SEQ ID NO: GHGR427 GMAR 428 ASPL 429 VPY 430 TAPY 431 VPF 432 TAPF 433 STAPF 434 GTAPF435 TTAPF 436 PTAPF 437 SSTAPF 438 GSTAPF 439 TSTAPF 440 PSTAPF 441SGTAPF 442 TGTAPF 443 GGTAPF 444 PGTAPF 445 STTAPF 446 GTTAPF 447 TTTAPF448 PTTAPF 449 SPTAPF 450 GPTAPF 451 TPTAPF 452 PPTAPF 453

Insertion Sites

In some cases, a nucleotide sequence of interest of a subject nucleicacid (e.g., a vector such as a plasmid) is an insertion site. In somecases as subject nucleic acid of interested includes an insertion sitein addition to a second nucleotide sequence of interest, such as any ofthose described above (e.g., a transgene, a sequence encoding a fusionprotein, etc.). An insertion site is a nucleotide sequence used for theinsertion of a desired sequence. For example, an insertion site can be asequence in the nucleic acid at which a transgene sequence will later beinserted. “Insertion sites” for use with various technologies are knownto those of ordinary skill in the art and any convenient insertion sitecan be used. An insertion site can be for any method for manipulatingnucleic acid sequences. For example, in some cases the insertion site isa multiple cloning site (MCS) (e.g., a site including one or morerestriction enzyme recognition sequences), a site for ligationindependent cloning, a site for recombination based cloning (e.g.,recombination based on att sites), a nucleotide sequence recognized by aCRISPR/Cas (e.g. Cas9) based technology, and the like.

An insertion site can be any desirable length, and can depend on thetype of insertion sites (e.g., can depend on whether (and how many) thesite includes one or more restriction enzyme recognition sequences,whether the site includes a target site for a CRISPR/Cas protein, etc.).In some cases, an insertion site of a subject nucleic acid is 3 or morenucleotides (nt) in length (e.g., 5 or more, 8 or more, 10 or more, 15or more, 17 or more, 18 or more, 19 or more, 20 or more or 25 or more,or 30 or more nt in length). In some cases, the length of an insertionsite of a subject nucleic acid has a length in a range of from 2 to 50nucleotides (nt) (e.g., from 2 to 40 nt, from 2 to 30 nt, from 2 to 25nt, from 2 to 20 nt, from 5 to 50 nt, from 5 to 40 nt, from 5 to 30 nt,from 5 to 25 nt, from 5 to 20 nt, from 10 to 50 nt, from 10 to 40 nt,from 10 to 30 nt, from 10 to 25 nt, from 10 to 20 nt, from 17 to 50 nt,from 17 to 40 nt, from 17 to 30 nt, from 17 to 25 nt). In some cases,the length of an insertion site of a subject nucleic acid has a lengthin a range of from 5 to 40 nt.

In some cases, an insertion site is said to be operably linked to apromoter. In general, the intent of an insertion site is that thisregion of the nucleic acid will get modified (e.g., in some casesreplaced) to include a nucleotide sequence encoding a transgene ofinterest (e.g., a transgene encoding a non-coding RNA, a transgeneencoding a protein, etc.), such that the inserted transgene sequencewill, once inserted, be operably linked to the promoter to which theinsertion site was/is operably linked. Likewise, in some cases, aninsertion site is said to be operably linked to a sequence encoding anRBS. In such cases, the intent is that an inserted transgene sequencewill, once inserted, be operably linked to the RBS to which theinsertion site was/is operably linked.

For example, in some cases a subject nucleic acid includes an insertionsite operably linked to a promoter. In some such cases, the nucleic acidcould later be modified by inserting a transgene sequence into theinsertion site, and in some such cases (e.g., if the transgene sequenceencodes a protein), the sequence to be inserted may include a sequenceencoding an RBS upstream of a transgene sequence.

In some cases, a subject nucleic acid includes an insertion siteoperably linked to a promoter and operably linked to a sequence encodingan RBS. In some such cases, the nucleic acid could later be modified byinserting a transgene sequence (e.g., a transgene sequence encoding aprotein) into the insertion site such that the inserted sequence will,once inserted, be operably linked to both the promoter and the sequenceencoding an RBS to which the insertion site was/is operably linked.

Nucleic Acids

In some embodiments, a subject nucleic acid is a vector. By a “vector”it is meant a nucleic acid that is capable of transferring apolynucleotide sequence, e.g. a transgene, to a target cell. For thepurposes of the present disclosure, “vector construct” and “expressionvector” generally refer to any nucleic acid construct, for example, alinear nucleic acid, a circular nucleic acid, a phage, a virus, a viralgenome (a viral construct), a cosmid, a plasmid, and the like, that iscapable of transferring a nucleotide sequence of interest (e.g., atransgene) into target cells (e.g., prokaryotic cells such asBacteroides cells). Thus, the term includes cloning and expressionvehicles, and extrachromosomally maintained vectors as well asintegrating vectors.

In some cases, a subject expression vector is a linear nucleic acidvector. In some cases, a subject expression vector is a circular nucleicacid. In some cases, a subject expression vector can be maintainedextrachromosomally, or “episomally” in the target cell, i.e., as alinear or circular nucleic acid that does not integrate into the targetcell genome. In some cases, a subject expression vector can integrateinto the genome of the host, i.e., as a linear or circular nucleic acidthat integrates into the host genome.

In some cases, a subject nucleic acid (e.g., an expression vector)includes an origin of replication. By an “origin of replication” or“replication origin” it is meant a particular sequence in a genome atwhich replication is initiated. Origins of replication are found inprokaryotes and eukaryotes, and are required for the propagation ofplasmids episomally (i.e. extragenomically) in host cells. By a“plasmid” it is meant a circular expression vector that comprises anorigin of replication and a selectable marker.

In some cases, a subject nucleic acid (e.g., a plasmid) includes anorigin of replication (e.g., one that is functional in a Bacteroidescell). However, in some embodiments, a subject nucleic acid (e.g.,plasmid) has an origin of replication that is not functional inBacteroides cells, but is functional in cells that are not Bacteroidescells (e.g., other prokaryotes such as E. coli). Such nucleic acids(e.g., plasmids such as an NBU2 integration plasmid) can be maintainedepisomally (e.g., propagated, amplified, isolated from, stored in, etc.)in prokaryotic cells that are not Bacteroides cells (e.g., they can insome cases be maintained episomally in E. coli), but are not maintainedepisomally in Bacteroides cells. Thus, instead of being maintainedepisomally in Bacteroides cells, these nucleic acids can be used for theintegration of sequences (e.g., from a plasmid) into the genome of aBacteroides cell (e.g., see the examples section below). In some cases,a subject nucleic acid is integrated into the genome of a Bacteroidescell.

Methods

Nucleic acid expression using a subject nucleic acid (e.g., one that isintegrated into a Bacteroides cell's genome, an expression vector, aplasmid, and the like) finds use in many applications, includingresearch and therapeutic applications. Subject methods include but arenot necessarily limited to methods of expressing a transgene in aprokaryotic cell (e.g., a Bacteroides cell), detectably labeling aBacteroides cell in an animal's gut (e.g., distinguishably labeling twoor more Bacteroides cells), delivering a protein to an individual's gut,and treating an individual (e.g., by delivering a protein-secretingBacteroides cell to an individual's gut).

In some embodiments (e.g., in methods of detectably labeling,delivering, and/or treating) a Bacteroides cell (e.g., a cell comprisinga subject nucleic acid) is introduced into an individual (e.g., into theindividual's gut). The individual can be any mammalian species, e.g.rodent (e.g., mouse, rat), ungulate, cow, pig, sheep, camel, rabbit,horse, dog, cat, primate, non-human primate, human, etc. The individualmay be a neonate, a juvenile, or an adult. In some cases, theintroduction is by oral administration. Any convenient type of oraladministration can be used. For example, oral administration can includedelivery via eating (e.g., incorporated into food), drinking (e.g.,incorporated into a solution such as drinking water), oral gavage (e.g.,using a stomach tube), aerosol spray, tablets, capsules, pills, powders,and the like. In some embodiments, a Bacteroides cell (e.g., a cellcomprising a subject nucleic acid) is introduced into an individual(e.g., into the individual's gut) by delivery into the individual'scolon.

As described for compositions, cells of the subject methods can be anyprokaryotic cell in which a subject promoter is operable (e.g.,prokaryotic cell, Bacteroides cell, E. coli cell). In some cases, thecell is a Bacteroides cell. In some cases, the Bacteroides cell is aspecies selected from: B. fragilis (Bf), B. distasonis (Bd), B.thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus (Bo), B. eggerrthii(Be), B. merdae (Bm), B. stercoris (Bs), B. uniformis (Bu), and B.caccae (Bc). In some cases, the Bacteroides cell is a species selectedfrom: B. fragilis (Bf), B. thetaiotaomicron (Bt), B. vulgatus (BV), B.ovatus (Bo), and B. uniformis (Bu). In some cases, the Bacteroides cellis a species selected from: B. thetaiotaomicron (Bt), B. vulgatus (By),B. ovatus (Bo), and B. uniformis (Bu).

In some cases, a subject method is a method of expressing a subjectnucleic acid in a prokaryotic cell. Such methods include introducing asubject nucleic acid into a prokaryotic cell. Any convenient method canbe used to introduce a nucleic acid into a prokaryotic cell, e.g., byelectroporation (e.g., using electro-competent cells), by conjugation,by chemical methods (e.g., using chemically competent cells), and thelike. The introduced nucleic acid may or may not be integrated(covalently linked) into the genome of the cell, and as described abovethis may depend on the presence or absence of an origin of replicationthat is functional in the cell. For example, in some cases, theintroduced nucleic acid integrates into the genome of the cell (as achromosomal integrant), e.g., a nucleic acid may integrated into thegenome of a Bacteroides cell if the nucleic acid does not have an originof replication that is functional in that Bacteroides cell. In somecases, the introduced nucleic acid is maintained on an episomal element(extra chromosomal element) such as a plasmid.

In some cases, a subject method is a method of detectably labeling aBacteroides cell in an animal's gut. In such cases, the Bacteroides cell(or population of cells) that is introduced into the gut, includes asubject nucleic acid that include a transgene whose expression producedetectably labels the cell. The phrase “detectably label” as used hereinrefers to a any detectable expression product (RNA, protein) that isdetectable. The expression product (the label) can itself be detectable(directly detectable label) (e.g., a fluorescent protein), or the labelcan be indirectly detectable, e.g., in the case of an enzymatic label,the enzyme (e.g., luciferase) may catalyze a chemical alteration of asubstrate compound or composition and the product of the reaction isdetectable.

In some cases, two or more Bacteroides cells (e.g., two distinctpopulations of Bacteroides cells) are labeled in such a way that the twoor more cells (or two or cell populations) are distinguishable from oneanother. The two cells (or cell populations) can differ from one anotherin a variety of ways. For example, the cells can be of different species(e.g., when it is desired to assay competition or balance between twodifferent species), the cells can be expressing different transgenes(e.g., different therapeutic peptides), and the like.

Distinguishably labeling two or more cells (or cell populations) fromone another can be achieved in a number of different ways and anyconvenient way is suitable. For example, a first cell (or cellpopulation) can be labeled with a first transgene (i.e., the first cellincludes a subject nucleic acid having a first transgene—where anexpression product of the first transgene is detectable), while a secondcell (or cell population) can be labeled with a second transgene (i.e.,the second cell includes a subject nucleic acid having a secondtransgene—where an expression product of the second transgene isdetectable). The two cells can be distinguishably labeled if the firstand second expression products are different. As an illustrativeexample, such would be the case if—Case 1—(1) the first cell included asubject nucleic acid in which a sequence encoding a green fluorescentprotein (GFP) was operably linked to a subject promoter, and (2) thesecond cell included a subject nucleic acid in which a sequence encodinga red fluorescent protein (RFP) was operably linked to a subjectpromoter. In Case 1, the promoters in the first and second cells couldbe the same promoter because the expression products themselves aredistinguishable and thus, the first and second cells would bedistinguishable from one another because the labels are distinguishablefrom one another.

However, two cells could also be distinguishably labeled from oneanother even if they were producing the same transgene expressionproduct (e.g., GFP). As an illustrative example, such would be the caseif—Case 2—(1) the first cell included a subject nucleic acid in which asequence encoding a green fluorescent protein (GFP) was operably linkedto a subject promoter, and (2) the second cell included a subjectnucleic acid in which a sequence encoding a green fluorescent protein(GFP) (the same transgene as the first cell) was operably linked to adifferent promoter of different strength. In Case 2, the promoters inthe first and second cells can be different so that the amount oftransgene expression product produced is different between the first andsecond cells. The cells would then be distinguishable from one anotherbecause one would be characteristically brighter than the other.

In some cases, a detectably labeled Bacteroides cell (or cellpopulation) is introduced into an animal's gut. In some cases, two ormore distinguishably labeled Bacteroides cells (e.g., cell populations)can be introduced into an animal's gut. If desired, the label(s) canthen be detected at numerous time points (tracked), and/or variousparameters can be assayed. For example, measuring the label(s) canprovide information about survival of the labeled cells in the gut, thesub-location of cells within the gut, the number of cells present ofparticular tracked species within the gut, the relative number oftracked species, and the like.

In some cases, a subject method is a method of delivering a protein toan individual's gut (which in some cases can be considered a method oftreating). In some such cases, a Bacteroides cell is introduced into thegut of an animal, where the cell includes a subject nucleic acidencoding a subject fusion protein (e.g., a secreted Bacteroidespolypeptide fused to a heterologous polypeptide of interest, e.g., atherapeutic peptide such as an anti-inflammation peptide or a metabolicenzyme). Any convenient fusion protein of the subject fusion proteinsdescribed above can be used. The polypeptide of interest of such offusion protein can be one that has any desirable activity in the gut(e.g., in the extracellular environment of the gut, in side of theBacteroides cell, or inside of a cell of the animal, e.g., if a subjectfusion protein is secreted from the bacteria via outer membrane vesicles(OMVs) and the contents of the OMVs make their way into a host cell). Asnoted above, in some cases, the polypeptide of interest is a therapeuticpeptide (e.g., a peptide that can, when secreted from a Bacteroidescell, e.g., in the gut of an individual via OMVs or via classicalsecretion across the outer membrane, have a positive impact on aclinical parameter of the individual) and the method can be considered amethod of treating an individual in need thereof. For example, thepolypeptide of interest can: have antimicrobial (antibiotic) activity(e.g., against one or more gut microbes), function to change gutenvironmental parameters (e.g., pH control), affect inflammation,provide an enzymatic activity to the Bacteroides cell (internal to thecell), and the like. All of these types of polypeptides of interest canbe considered therapeutic peptides.

Because a large variety of polypeptides of interest (any polypeptide ofinterest) can be delivered using a subject secreted fusion protein(e.g., one with a cleavable linker between the polypeptide of interestand the secreted Bacteroides protein), a large variety of individualswith a large variety of ailments can be targeted (i.e., a subjectBacteroides cell can be introduced into a variety of individuals with avariety of ailments). Diseases that can be treated with a therapeuticpeptide include but are not limited to diseases that are impacted by thegut microbiota, including obesity, diabetes, heart disease, centralnervous system diseases, rheumatoid arthritis, metabolic disorders, andcancer. For example, in some cases, the individual has gut inflammation,and in some such cases the individual has antiinflammatory diseases(e.g., Crohn's disease, ulcerative colitis, and the like), and in somecases gut inflammation can indirectly impact the disease, such ascolorectal cancer or obesity.

As noted above, examples of therapeutic peptides that can be used aspolypeptides on interest in a subject fusion protein include but are notlimited to metabolic enzymes and anti-inflammatory peptides, which caninclude but are not limited to those presented in Table 8 (SEQ ID NOs:411-417). In some cases (e.g., in some cases where the individual hasgut inflammation, e.g., colitis), the polypeptide of interest includesan amino acid sequence selected from: RYTVELA (SEQ ID NO: 411)(Peptide101.10), VTLVGNTFLQSTINRTIGVL (SEQ ID NO: 412)(Fp MAM-pep5), and MQPPGC(SEQ ID NO: 413)(CD80-CAP1). In some cases (e.g., in some cases wherethe individual has gut inflammation, e.g., colitis), the polypeptide ofinterest includes an amino acid sequence selected from: RYTVELA (SEQ IDNO: 411)(Peptide 101.10), and VTLVGNTFLQSTINRTIGVL (SEQ ID NO: 412)(FpMAM-pep5). In some cases, the polypeptide of interest includes the aminoacid sequence RYTVELA (SEQ ID NO: 411)(Peptide 101.10). In some cases,the polypeptide of interest includes the amino acid sequenceVTLVGNTFLQSTINRTIGVL (SEQ ID NO: 412)(Fp MAM-pep5).

Kits

Also provided are kits, e.g., for practicing any of the above methods.The contents of the subject kits may vary greatly. A kit can include:(i) a first subject nucleic acid (e.g., a nucleic acid that includes apromoter operable in a Bacteroides cell operably linked to aheterologous nucleotide sequence of interest), and (ii) at least one of:a Bacteroides cell, and a second subject nucleic acid. In some cases,the promoters of the first and second nucleic acids are different. Insome cases, the nucleotide sequence of interest of the first and secondnucleic acids are different. In some cases, a kit includes two or more(3 or more, 4 or more, etc.) subject nucleic acids, each with adifferent promoter (e.g., each with promoters of a different strength).In some cases, the nucleic acid(s) of a subject kit is a plasmid. Insome cases, the plasmid(s) can be propagated episomally in E. coli, butdoes not contain an origin of replication that is functional inBacteroides cells. In some cases, a subject kit includes one or morespecies of Bacteroides cells selected from: B. fragilis (Bf), B.distasonis (Bd), B. thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus(Bo), B. eggerrthii (Be), B. merdae (Bm), B. stercoris (Bs), B.uniformis (Bu), and B. caccae (Bc). In some cases, the cell(s) of thekit do not (yet) contain a subject nucleic acid. In some cases, thecell(s) of the kit includes a subject nucleic acid integrated into thegenome of the cell.

In addition to the above components, the subject kits can furtherinclude instructions for practicing the subject methods. Theseinstructions may be present in the subject kits in a variety of forms,one or more of which may be present in the kit. One form in which theseinstructions may be present is as printed information on a suitablemedium or substrate, e.g., a piece or pieces of paper on which theinformation is printed, in the packaging of the kit, in a packageinsert, etc. Yet another means would be a computer readable medium,e.g., diskette, CD, flash drive, etc., on which the information has beenrecorded. Yet another means that may be present is a website addresswhich may be used via the internet to access the information at aremoved site. Any convenient means may be present in the kits.

Examples of Non-Limiting Aspects of the Disclosure

Aspects, including embodiments, of the present subject matter describedabove may be beneficial alone or in combination, with one or more otheraspects or embodiments.

Without limiting the foregoing description, certain non-limiting aspectsof the disclosure: Set A numbered 1-73; and Set B numbered 1-77 areprovided below. As will be apparent to those of skill in the art uponreading this disclosure, each of the individually numbered aspects maybe used or combined with any of the preceding or following individuallynumbered aspects. This is intended to provide support for all suchcombinations of aspects and is not limited to combinations of aspectsexplicitly provided below:

Set A

1. A nucleic acid for expression in a prokaryotic cell, the nucleic acidcomprising:

-   -   (a) a promoter operable in a Bacteroides cell, wherein the        promoter comprises a nucleotide sequence having:        -   (i) 80% or more identity with the nucleotide sequence: GTTAA            (n)₃₋₇GTTAA (n)₃₆₋₃₈TA (n)₂ TTTG (SEQ ID NO: 400), and/or        -   (ii) 80% or more identity with the phage promoter sequence            set forth in any of SEQ ID NOs: 388 and 407; and    -   (b) a heterologous nucleotide sequence of interest that is        operably linked to the promoter.

2. The nucleic acid according to 1, wherein the nucleotide sequence ofinterest is a transgene sequence that encodes a protein.

3. The nucleic acid according to 2, wherein the protein encoded by thetransgene sequence is a reporter protein, a selectable marker protein, ametabolic enzyme, and/or a therapeutic protein.

4. The nucleic acid according to 2 or 3, wherein the protein encoded bythe transgene sequence is a fusion protein comprising a cleavable linkerand a secreted Bacteroides polypeptide fused to a heterologouspolypeptide of interest, wherein the cleavable linker is positionedbetween the secreted Bacteroides polypeptide and the polypeptide ofinterest.

5. The nucleic acid according to 1, wherein the nucleotide sequence ofinterest is a transgene sequence that encodes a non-coding RNA.

6. The nucleic acid according to 1, wherein the nucleotide sequence ofinterest is an insertion site.

7. The nucleic acid according to 6, wherein the insertion site is amultiple cloning site.

8. The nucleic acid according to any of 1-7, wherein the promotercomprises a nucleotide sequence that has 80% or more sequence identitywith the wild type Bacteroides phage promoter sequence set forth in SEQID NO: 388.

9. The nucleic acid according to any of 1-8, wherein the promotercomprises the nucleotide sequence set forth in any of SEQ ID NOs:381-388.

10. The nucleic acid according to 8 or 9, wherein the promoter is asynthetic promoter.

11. The nucleic acid according to any of 1-7, wherein the promotercomprises the nucleotide sequence GTTAA (n)₃₋₇GTTAA (n)₃₆₋₃₈TA (n)₂ TTTG(SEQ ID NO: 400).

12. The nucleic acid according to any of 1-11, further comprising asequence encoding a ribosomal binding site (RBS), wherein the sequenceencoding the ribosomal binding site (RBS) is operably linked to thepromoter and to the nucleotide sequence of interest, and is positioned5′ of the nucleotide sequence of interest.

13. The nucleic acid according to 12, wherein the sequence encoding theRBS comprises a nucleotide sequence that has 80% or more sequenceidentity with the sequence set forth in any of SEQ ID NOs: 10-18.

14. The nucleic acid according to 12, wherein the RBS is a synthetic RBSand the sequence encoding the synthetic RBS comprises a nucleotidesequence that has 80% or more sequence identity with the sequence setforth in any of SEQ ID NOs: 11-18.

15. The nucleic acid according to any of 12-14, comprising thenucleotide sequence set forth in any of SEQ ID NOs: 20-83.

16. The nucleic acid according to any of 1-15, further comprising aterminator sequence upstream of the promoter.

17. The nucleic acid according to any of 1-16, wherein the nucleic acidis a plasmid.

18. The nucleic acid according to 17, wherein the plasmid comprises anorigin of replication that functions in prokaryotic cells other thanBacteroides cells, but does not function in Bacteroides cells.

19. A nucleic acid for expression in a prokaryotic cell, the nucleicacid comprising, in 5′ to 3′ order:

-   -   (a) a promoter operable in a prokaryotic cell;    -   (b) a sequence encoding a synthetic ribosomal binding site        (RBS), wherein said sequence: (i) is operably linked to the        promoter, and (ii) comprises a nucleotide sequence that has 80%        or more sequence identity with the sequence set forth in any of        SEQ ID NOs: 10-18; and    -   (c) a nucleotide sequence of interest that is operably linked to        the promoter and to the synthetic RBS.

20. The nucleic acid according to 19, wherein the sequence encoding thesynthetic RBS comprises the nucleotide sequence set forth in any of SEQID NOs: 11-18.

21. The nucleic acid according to 19 or 20, wherein the nucleotidesequence of interest encodes a protein.

22. The nucleic acid according to 19 or 20, wherein the nucleotidesequence of interest is an insertion site.

23. A prokaryotic cell comprising the nucleic acid of any of 1-22.

24. The prokaryotic cell of 23, wherein the nucleic acid is notintegrated into a chromosome of the prokaryotic cell.

25. The prokaryotic cell of 23, wherein the nucleic acid is integratedinto a chromosome of the prokaryotic cell.

26. The prokaryotic cell of any of 23-25, wherein the cell is aBacteroides cell.

27. The prokaryotic cell of any of 23-25, wherein the cell is aprokaryotic cell that is not a Bacteroides cell.

28. The prokaryotic cell of 27, wherein the cell is an E. coli cell.

29. A kit for expression in prokaryotic cells, the kit comprising:

-   -   (i) a first nucleic acid of any of 1-22; and    -   (ii) at least one of: (a) a Bacteroides cell, and (b) a second        nucleic acid of any of 1-22.

30. The kit of 29, comprising the first and second nucleic acids, eachof which comprise

-   -   (i) a promoter that comprises a nucleotide sequence that has 80%        or more sequence identity with the wild type Bacteroides phage        promoter sequence set forth in SEQ ID NO: 388, and    -   (ii) a sequence encoding a synthetic ribosomal binding site        (RBS) that comprises a nucleotide sequence that has 80% or more        sequence identity with the sequence set forth in any of SEQ ID        NOs: 11-18.

31. The kit of 30, wherein the first and second nucleic acids eachcomprise the nucleotide sequence set forth in any of SEQ ID NOs: 20-83.

32. The kit of any of 29-31, wherein the first and/or second nucleicacid is a plasmid.

33. The kit of any of 29-32, comprising a third nucleic acid of any of1-22.

34. A method of expressing a nucleic acid in a prokaryotic cell, themethod comprising: introducing the nucleic acid of any of 1-22 into aprokaryotic cell.

35. The method according to 34, wherein the prokaryotic cell is aBacteroides cell.

36. The method according to 35, wherein the Bacteroides cell is a cellof a species selected from: B. fragilis (Bf), B. distasonis (Bd), B.thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus (Bo), B. eggerrthii(Be), B. merdae (Bm), B. stercoris (Bs), B. uniformis (Bu), and B.caccae (Bc).

37. The method according to 34, wherein the prokaryotic cell is an E.coli cell.

38. The method according to any of 34-37, wherein the nucleotidesequence of interest is a transgene encoding a fusion protein comprisinga cleavable linker and a secreted Bacteroides polypeptide fused to aheterologous polypeptide of interest, wherein the cleavable linker ispositioned between the secreted Bacteroides polypeptide and thepolypeptide of interest.

39. A method of detectably labeling a Bacteroides cell in an animal'sgut, the method comprising:

-   -   introducing, into the gut of the animal, a first detectably        labeled Bacteroides cell comprising a first nucleic acid        comprising:    -   (a) a first promoter operable in Bacteroides cells, wherein the        first promoter comprises a nucleotide sequence having:        -   (i) 80% or more identity with the nucleotide sequence: GTTAA            (n)₃₋₇GTTAA (n)₃₆₋₃₈TA (n)₂ TTTG (SEQ ID NO: 400), and/or        -   (ii) 80% or more identity with the phage promoter sequence            set forth in any of SEQ ID NOs: 388 and 407; and    -   (b) a first transgene comprising a nucleotide sequence that        encodes a first expression product that detectably labels the        first detectably labeled Bacteroides cell, wherein the first        transgene is: (i) heterologous relative to the first promoter        and (ii) operably linked to the first promoter.

40. The method according to 39, wherein the method comprisesintroducing, into the gut of the animal, a second detectably labeledBacteroides cell comprising a second nucleic acid comprising:

-   -   (a) a second promoter operable in Bacteroides cells, wherein the        second promoter comprises a nucleotide sequence having:        -   (i) 80% or more identity with the nucleotide sequence: GTTAA            (n)₃₋₇ GTTAA (n)₃₆₋₃₈TA (n)₂ TTTG (SEQ ID NO: 400), and/or        -   (ii) 80% or more identity with the phage promoter sequence            set forth in any of SEQ ID NOs: 388 and 407; and    -   (b) a second transgene comprising a nucleotide sequence that        encodes a second expression product that detectably labels the        second detectably labeled Bacteroides cell, wherein the second        transgene is: (i) heterologous relative to the second promoter        and (ii) operably linked to the second promoter,    -   wherein the first and second detectably labeled Bacteroides        cells are distinguishable from one another.

41. The method according to 40, wherein the first and second expressionproducts are distinguishable from one another.

42. The method according to 41, wherein the first and second promotersare the same.

43. The method according to 40, wherein the first and second expressionproducts are indistinguishable from one another, but the first andsecond promoters are different from one another and produce differentamounts of the first and second expression products.

44. The method according to any of 39-43, wherein the first expressionproduct is a reporter protein.

45. The method according to 44, wherein the reporter protein is afluorescent protein.

46. The method according to any of 39-45, wherein the first Bacteroidescell is the same species as the second Bacteroides cell.

47. The method according to any of 39-45, wherein the first Bacteroidescell is not the same species as the second Bacteroides cell.

48. A fusion protein comprising: a secreted Bacteroides polypeptidefused to a heterologous polypeptide of interest.

49. The fusion protein of 48, wherein the secreted Bacteroidespolypeptide is a secreted fragment or secreted variant of a naturallyoccurring Bacteroides polypeptide.

50. The fusion protein of 48 or 49, wherein the secreted Bacteroidespolypeptide comprises an amino acid sequence that has 80% or moresequence identity with the amino acid sequence set forth in any of SEQID NOs: 458-484.

51. The fusion protein of 48, wherein the secreted Bacteroidespolypeptide is a naturally occurring secreted protein of a Bacteroidescell.

52. The fusion protein of 48 or 51, wherein the secreted Bacteroidespolypeptide comprises the amino acid sequence set forth in any of SEQ IDNOs: 458-484.

53. The fusion protein of 52, wherein the secreted Bacteroidespolypeptide comprises the amino acid sequence set forth in SEQ ID NO:459.

54. The fusion protein of any of 48-53, comprising a cleavable linkerpositioned between the secreted Bacteroides polypeptide and thepolypeptide of interest.

55. The fusion protein of 54, wherein the cleavable linker is cleavableby one or more gut proteases.

56. The fusion protein of 55, wherein the cleavable linker is cleavableby one or more gut proteases selected from: a trypsin, a chymotrypsin,and an elastase.

57. The fusion protein of 55, wherein the cleavable linker is set forthin any of SEQ ID NOs: 420-453

58. The fusion protein of any of 48-57, wherein polypeptide of interestcomprises the amino acid sequence of any one of the peptides presentedin Table 8 (SEQ ID NOs: 411-417).

59. The fusion protein of 58, wherein polypeptide of interest comprisesthe amino acid sequence RYTVELA (SEQ ID NO: 411) or VTLVGNTFLQSTINRTIGVL(SEQ ID NO: 412).

60. A nucleic acid encoding the fusion protein of any of 48-59.

61. The nucleic acid of 60, wherein the nucleic acid is a plasmid.

62. The nucleic acid of 61, wherein the plasmid comprises an origin ofreplication that functions in prokaryotic cells other than Bacteroidescells, but does not function in Bacteroides cells.

63. A method of delivering a protein to an individual's gut, the methodcomprising: introducing, into an individual's gut, a Bacteroides cellcomprising the nucleic acid according to any one of 1-22 and 60-62.

64. The method according to 63, wherein the nucleic acid is integratedinto the genome of the Bacteroides cell.

65. The method according to 63 or 64, wherein the individual has adisease impacted by gut microbiota.

66. The method according to 65, wherein the individual has a diseaseselected from: obesity, diabetes, heart disease, central nervous systemdiseases, rheumatoid arthritis, metabolic disorders, and cancer.

67. The method according to 63 or 64, wherein the individual has gutinflammation.

68. The method according to 63 or 64, wherein the individual hascolitis.

69. The method according to any of 65-68, wherein the method of is amethod of treating the individual.

70. The method according to any of 63-69, wherein the Bacteroides cellis a cell of a species selected from: B. fragilis (Bf), B. distasonis(Bd), B. thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus (Bo), B.eggerrthii (Be), B. merdae (Bm), B. stercoris (Bs), B. uniformis (Bu),and B. caccae (Bc).

71. The method of 70, wherein the Bacteroides cell is a B.thetaiotaomicron (Bt) cell.

72. The method according to any of 63-71, wherein polypeptide ofinterest comprises the amino acid sequence RYTVELA (SEQ ID NO: 411) orVTLVGNTFLQSTINRTIGVL (SEQ ID NO: 412).

73. A method of treating an individual in need thereof, comprising:

-   -   performing the method of any of 65-68.

Set B

1. A nucleic acid, comprising:

-   -   (a) a promoter operable in a prokaryotic cell, wherein the        promoter comprises a nucleotide sequence comprising one or more        of the following:    -   (i) 80% or more sequence identity of defined nucleotides of the        nucleotide sequence: GTTAA (n)₄₋₇ GTTAA (n)₃₄₋₃₈ TA (n)₂ TTTG,    -   (ii) 80% or more sequence identity with a sequence set forth in        any of SEQ ID NOs: 388 and 407,    -   (iii) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA,    -   (iv) a nucleotide sequence comprising GTTAA (n)₄₄₋₅₀ TA,    -   (v) a nucleotide sequence comprising GTTAA (n)₄₈₋₅₄ TTTG,    -   (vi) a nucleotide sequence comprising GTTAA (n)₃₆₋₃₈TA,    -   (vii) a nucleotide sequence comprising GTTAA (n)₄₀₋₄₂ TTTG,    -   (viii) a nucleotide sequence comprising GTTAA (n)₃₋₇ GTTAA        (n)₃₆₋₃₈ TA,    -   (ix) a nucleotide sequence comprising GTTAA (n)₃₋₇ GTTAA        (n)₄₀₋₄₂ TTTG,    -   (x) a nucleotide sequence comprising GTTAA (n)₄₄₋₅₀ TA (n)₂        TTTG,    -   (xi) a nucleotide sequence comprising GTTAA (n)₃₆₋₃₈ TA (n)₂        TTTG,    -   (xii) a nucleotide sequence comprising GTTAA (n)₀₋₂₀GTTAA        (n)₁₀₋₆₀ TA (n)₀₋₁₀ TTTG,        -   (xiii) a nucleotide sequence comprising TTAA (n)₀₋₁₀ TTAA            (n)₃₀₋₅₀TA (n)₂ TTTG,        -   (xiv) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA            (n)₃₆₋₃₉ TA (n)₂ TTTGC,        -   (xv) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA            (n)₃₆₋₃₉ TA (n)₂ TTTG,        -   (xvi) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA            (n)₃₄₋₃₈ TA (n)₂ TTTG,        -   (xvii) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA            (n)₃₆₋₃₈ TA (n)₂ TTTG,        -   (xviii) a nucleotide sequence comprising GTTAA (n)₃₋₇ GTTAA            (n)₃₆₋₃₈ TA (n)₂ TTTG,        -   (xix) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA            (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂ TTTGC,        -   (xx) a nucleotide sequence comprising GTTAA (n)₃₋₇ GTTAA            (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂ TTTG,        -   (xxi) a nucleotide sequence comprising GTTAA (n)₄₋₈ GTTAA            (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂ TTTG, and        -   (xxii) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA            (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂ TTTG, wherein each n is            independently selected from A, C, G, and T; and    -   (b) a nucleotide sequence of interest that is operably linked to        the promoter, wherein the nucleotide sequence of interest and        the promoter are not found operably linked in nature.

2. The nucleic acid of 1, wherein the prokaryotic cell is a Bacteroidescell.

3. A nucleic acid, comprising:

-   -   (a) a promoter operable in a Bacteroides cell, and    -   (b) a nucleotide sequence of interest that is operably linked to        the promoter, wherein the nucleotide sequence of interest and        the promoter are not found operably linked in nature, wherein        the promoter provides one or more of the following when the        nucleic acid is expressed in the Bacteroides cell:    -   (i) an increase in mRNA production of at least 30% relative to a        native Bacteroides promoter,    -   (ii) an increase in fluorescence of at least 2000% relative to        autofluorescence, wherein the nucleotide sequence of interest        encodes super-folding GFP, or    -   (iii) a cytoplasmic protein concentration of at least 1.5 μM,        wherein the nucleotide sequence of interest encodes the protein.

4. The nucleic acid of 3, wherein the native Bacteroides promoter is anative Bacteroides rRNA promoter.

5. The nucleic acid of 3, wherein the increase in mRNA production is atleast 50%.

6. The nucleic acid of 3, wherein the increase in mRNA production is atleast 100%.

7. The nucleic acid of 3, wherein the increase in fluorescence is atleast 5000%.

8. The nucleic acid of 3, wherein the increase in fluorescence is atleast 8000%.

9. The nucleic acid of 3, wherein the cytoplasmic protein concentrationis at least 2 μM.

10. The nucleic acid of 3, wherein the cytoplasmic protein concentrationis at least 5 μM.

11. The nucleic acid of 3, wherein the cytoplasmic protein concentrationis at least 10 μM.

12. The nucleic acid of 3, wherein the protein is luciferase.

13. The nucleic acid of 1 or 3, wherein the promoter is a phage promoteror a functional fragment thereof.

14. The nucleic acid of 12, wherein the phage is ϕB124-14.

15. The nucleic acid of 1 or 3, wherein the promoter is a non-naturallyoccurring promoter.

16. The nucleic acid of 1 or 3, wherein the promoter comprises anucleotide sequence having 80% or more sequence identity with thenucleotide sequence: GTTAA (n)₃₋₇ GTTAA (n)₃₆₋₃₈ TA (n)₂ TTTG (SEQ IDNO: 400).

17. The nucleic acid of any of 1-7, wherein the promoter comprises thenucleotide sequence GTTAA (n)₃₋₇ GTTAA (n)₃₆₋₃₈TA (n)₂ TTTG (SEQ ID NO:400).

18. The nucleic acid of 1 or 3, wherein the promoter comprises anucleotide sequence that has 80% or more sequence identity with thesequence set forth in any of SEQ ID NOs: 388 and 407.

19. The nucleic acid of 1 or 3, wherein the promoter comprises thenucleotide sequence set forth in any of SEQ ID NOs: 381-388.

20. The nucleic acid of 1 or 3, wherein the nucleotide sequence ofinterest comprises a transgene sequence that encodes a protein.

21. The nucleic acid of 17, wherein the protein encoded by the transgenesequence comprises a reporter protein, a selectable marker protein, ametabolic enzyme, or a therapeutic protein.

22. The nucleic acid of 17, wherein the protein encoded by the transgenesequence is a fusion protein comprising a cleavable linker and asecreted Bacteroides polypeptide fused to a heterologous polypeptide ofinterest, wherein the cleavable linker is positioned between thesecreted Bacteroides polypeptide and the polypeptide of interest.

23. The nucleic acid of 1 or 3, wherein the nucleotide sequence ofinterest comprises a transgene sequence that encodes a non-coding RNA.

24. The nucleic acid of 1 or 3, wherein the nucleotide sequence ofinterest is an insertion site.

25. The nucleic acid of 24, wherein the insertion site is a multiplecloning site.

26. The nucleic acid of any of 1-25, further comprising a sequenceencoding a ribosomal binding site (RBS), wherein the sequence encodingthe ribosomal binding site (RBS) is operably linked to the promoter andto the nucleotide sequence of interest, and is positioned 5′ of thenucleotide sequence of interest.

27. The nucleic acid of any of 1-26, further comprising a terminatorsequence upstream of the promoter.

28. The nucleic acid of any of 1-27, wherein the nucleic acid is aplasmid.

29. The nucleic acid of 28, wherein the plasmid comprises an origin ofreplication that functions in prokaryotic cells other than Bacteroidescells, but does not function in Bacteroides cells.

30. A prokaryotic cell comprising the nucleic acid of any of 1-29.

31. The prokaryotic cell of 21, wherein the nucleic acid is notintegrated into a chromosome of the prokaryotic cell.

32. The prokaryotic cell of 21, wherein the nucleic acid is integratedinto a chromosome of the prokaryotic cell.

33. The prokaryotic cell of any of 21-22, wherein the prokaryotic cellis a Bacteroides cell.

34. The prokaryotic cell of any of 21-22, wherein the prokaryotic cellis not a Bacteroides cell.

35. The prokaryotic cell of 24, wherein the prokaryotic cell is an E.coli cell.

36. A method of expressing a nucleic acid in a prokaryotic cell, themethod comprising: introducing the nucleic acid of any of 1-29 into theprokaryotic cell.

37. The method of 36, wherein the prokaryotic cell is a Bacteroidescell.

38. The method of 27, wherein the Bacteroides cell is a cell of aspecies selected from: B. fragilis (Bf), B. distasonis (Bd), B.thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus (Bo), B. eggerrthii(Be), B. merdae (Bm), B. stercoris (Bs), B. uniformis (Bu), and B.caccae (Bc).

39. The method of 36, wherein the prokaryotic cell is an E. coli cell.

40. The method of any of 36-29, wherein the nucleotide sequence ofinterest is a transgene encoding a fusion protein comprising a cleavablelinker and a secreted Bacteroides polypeptide fused to a heterologouspolypeptide of interest, wherein the cleavable linker is positionedbetween the secreted Bacteroides polypeptide and the polypeptide ofinterest.

41. A fusion protein comprising: a secreted Bacteroides polypeptidefused to a heterologous polypeptide of interest.

42. The fusion protein of 31, wherein the secreted Bacteroidespolypeptide is a secreted fragment or secreted variant of a naturallyoccurring Bacteroides polypeptide.

43. The fusion protein of 31-42, wherein the secreted Bacteroidespolypeptide comprises an amino acid sequence that has 80% or moresequence identity with an amino acid sequence set forth in any of SEQ IDNOs: 458-484.

44. The fusion protein of 31, wherein the secreted Bacteroidespolypeptide is a naturally occurring secreted protein of a Bacteroidescell.

45. The fusion protein of 31 or 44, wherein the secreted Bacteroidespolypeptide comprises an amino acid sequence set forth in any of SEQ IDNOs: 458-484.

46. The fusion protein of 45, wherein the secreted Bacteroidespolypeptide comprises the amino acid sequence set forth in SEQ ID NO:459.

47. The fusion protein of any of 31-46, comprising a cleavable linkerpositioned between the secreted Bacteroides polypeptide and thepolypeptide of interest.

48. The fusion protein of 33, wherein the cleavable linker is cleavableby one or more gut proteases.

49. The fusion protein of 34, wherein the cleavable linker is cleavableby one or more gut proteases selected from: a trypsin, a chymotrypsin,and an elastase.

50. The fusion protein of 34, wherein the cleavable linker is set forthin any of SEQ ID NOs: 420-453.

51. The fusion protein of any of 31-36, wherein the polypeptide ofinterest is an anti-inflammatory peptide.

52. The fusion protein of 37, wherein the anti-inflammatory peptidecomprises an amino acid sequence set forth in any of SEQ ID NOs:411-417.

53. The fusion protein of 38, wherein the anti-inflammatory peptidecomprises the amino acid sequence RYTVELA (SEQ ID NO: 411) orVTLVGNTFLQSTINRTIGVL (SEQ ID NO: 412).

54. A nucleic acid encoding the fusion protein of any of 31-53.

55. The nucleic acid of 39, wherein the nucleic acid is a plasmid.

56. The nucleic acid of 40, wherein the plasmid comprises an origin ofreplication that functions in prokaryotic cells other than Bacteroidescells, but does not function in Bacteroides cells.

57. An outer membrane vesicle, comprising the fusion protein of any of31-53.

58. A method of delivering a polypeptide, comprising: recombinantlyexpressing the fusion protein of any of 31-53 in a prokaryotic cell; anddelivering the fusion protein or the polypeptide of interest outside ofthe prokaryotic cell.

59. The method of 43, wherein the secreted Bacteroides polypeptide issecreted from the cell.

60. The method of 43, further comprising releasing the polypeptide ofinterest from the secreted Bacteroides polypeptide.

61. The method of 60, wherein release is performed by a protease.

62. The method of 61, wherein the protease is a gut protease.

63. The method of 61, wherein the protease is a cytoplasmic protease.

64. The method of 61, wherein the protease is a protease found in a cellof a different organism than the prokaryotic cell.

65. The method of 43, further comprising delivering the fusion proteinor the polypeptide of interest to a gut.

66. The method of 43, further comprising packaging the fusion protein orthe polypeptide of interest into an outer membrane vesicle.

67. The method of 45, further comprising fusing the outer membranevesicle with a cell membrane of a second cell.

68. The method of 43, further comprising delivering the fusion proteinor the polypeptide of interest to a second cell.

69. The method of 47, wherein the second cell is a eukaryotic cell.

70. The method of 47, wherein the second cell is a mammalian cell.

71. A method of delivering a protein to an individual's gut, the methodcomprising: introducing, into an individual's gut, a Bacteroides cellcomprising the nucleic acid of any one of 1-29 and 39-41.

72. The method of 50, wherein the nucleic acid is integrated into thegenome of the Bacteroides cell.

73. The method of 50 or 72, wherein the individual has a diseaseimpacted by gut microbiota.

74. The method of 50 or 72, wherein the individual has a diseaseselected from: obesity, diabetes, heart disease, central nervous systemdiseases, rheumatoid arthritis, metabolic disorders, and cancer.

75. The method of 50 or 72, wherein the individual has gut inflammation.

76. The method of 50 or 72, wherein the individual has colitis.

77. The method of any of 50-54, wherein the Bacteroides cell is a cellof a species selected from: B. fragilis (Bf), B. distasonis (Bd), B.thetaiotaomicron (Bt), B. vulgatus (By), B. ovatus (Bo), B. eggerrthii(Be), B. merdae (Bm), B. stercoris (Bs), B. uniformis (Bu), and B.caccae (Bc).

EXAMPLES

The following examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how tomake and use the present invention, and are not intended to limit thescope of what the inventors regard as their invention nor are theyintended to represent that the experiments below are all or the onlyexperiments performed. Efforts have been made to ensure accuracy withrespect to numbers used (e.g. amounts, temperature, etc.) but someexperimental errors and deviations should be accounted for. Unlessindicated otherwise, parts are parts by weight, molecular weight isweight average molecular weight, temperature is in degrees Centigrade,and pressure is at or near atmospheric.

In the examples below, the platform for engineering Bacteroidespresented herein adds to an emerging palette of tools that can synergizeto add new dimensions to the mechanistic understanding of gut ecology.The work here provides an example of the basic molecular insight intoBacteroides promoter architecture, which required high throughput straingeneration. In addition, the work here facilitates defining single cellbehavior in the context of the complex and dynamic gut microbialecosystem. For example, closely related species or isogenic knockoutvariants can be distinguished and provide a step toward single cellreporting of location specific conditions within the gut (e.g.,mammalian gut). The work here also provides tools that can directly beapplied to develop therapeutic microbes. High expression from a strain,with secretion and clean release of peptides was applied to developingtwo therapeutic strains, each successful in treating murine colitis. Thecompositions and methods provided here for strain manipulation, proteinexpression and peptide secretion were demonstrated to functionpredictably across the Bacteroides genus and in different genetic andenvironmental contexts.

Example 1: Strong Predictable Expression and High-ThroughputModification for the Abundant Gut Commensal Genus, Bacteroides

Appling synthetic biology to engineer gut-resident microbes provides anew avenue to investigate microbe-host interactions, performdiagnostics, and deliver therapeutics. The data presented heredemonstrate a platform for engineering Bacteroides, the most abundantgenus in Western microbiotas. Using a new high-throughput genomicintegration method, a phage promoter was identified and a set ofconstitutive promoters spanning over four logs of strength wasgenerated. These promoters produce an unprecedented level of expression,confer no fitness burden within the gut over 14 days, functionpredictably over a million-fold expression range in phylogeneticallydiverse Bacteroides species, and allowed strains living within the gutto be distinguished from one another by fluorescence.

Results High-Throughput Strain Modification Method

The NBU2 integration plasmid was adapted for compatibility with GoldenGate cloning to enable rapid and reliable plasmid construction andgenomic integration. These modifications were used for basic DNA partsto be assembled into expression cassettes on Bacteroides integrationplasmids in a one-pot reaction (FIG. 4 ). A conjugation protocol wasalso developed that can be executed with 96-well compatible liquidhandling steps to improve through-put of genetic modification. Whencombined with Golden Gate cloning the entire process of going from basicparts to colonies of Bacteroides with genomically integrated constructscould be performed in 3 days with high-throughput liquid handling (FIG.1 ). To assess the accuracy of this protocol, 40 different 4-pieceassemblies were performed and these constructs were genomicallyintegrated into 4 different species of Bacteroides: B. thetaiotaomicron(Bt), B. vulgatus (By), B. ovatus (Bo), and B. uniformis (Bu). A successrate of over 99% was achieved using this new pipeline, with similarsuccess rates for each species (FIG. 15 —Table 1).

Maximizing Protein Expression

Expression of heterologous proteins in Bacteroides at levels sufficientfor detection in vivo has been a substantial challenge. Initial attemptsto produce high protein expression in Bt using the 16S rRNA promoter(P_(rRNA)), previously used for high expression, combined with theBacteroides consensus ribosome binding site (RBS) driving GFP failed toproduce fluorescence above background levels. Thus, in an attempt toidentify a strong RBS sequence and maximize protein production viatranslation, three different RBS libraries were designed: an A/G richdegenerate sequence resembling the reported consensus sequence, acompletely degenerate sequence, and an A/T-rich sequence resembling theresidues found upstream of B. fragilis (B) phage genes. When tested withP_(rRNA) (FIG. 5 a ) and a fructose inducible promoter, P_(BT1763) (FIG.5 b ), the A/T-rich library sequence, N₉W₃A₃W₂TWANAATAATG (SEQ ID NO:371), produced substantially stronger expression sequences than theother two libraries, while the A/G-rich library produced even weakerexpression than the unbiased degenerate sequence. The phage based RBSlibrary sequence was similar to the A/T-rich RBSs of highly expressednative Bacteroides genes. Despite the improvements in translation, thehighest expression strain produced fluorescence only 40% abovebackground, prompting a search for stronger promoters.

17 sequences with a high identity to the Bacteroides promoter consensussequence (found within either of two phage genomes) were synthesized andtested to identify a strong promoter. The length of the highest strengthphage promoter was varied and an upstream intrinsic terminator was usedfor reduced context dependence. The promoter sequence from −100 to +20from the putative transcription start site, based on homology, producedthe highest expression (FIG. 6 ). This phage promoter, here termedP_(BfP1E6) (SEQ ID NO: 8), was compared to P_(rRNA), the two strongestnative Bt promoters identified from available transcriptional profilingdata (P_(BT1830) and P_(BT4615)), and the strongest promoter from arecent publication on synthetic biology tools for Bt, P_(BT1311) (Parkeret. al., Plasmid. 2012 September; 68(2):86-92). For each promotertested, an A/T-rich RBS library of 192 RBSs was screened and thestrongest RBS constructs for each promoter were compared. Strains withP_(BfP1E6)-driven expression produce fluorescence approximately 10-foldhigher than the next highest promoter, P_(BT4615), 40-fold higher thanP_(BT1311), and 70-fold higher than P_(rRNA) (FIG. 2 a , black bars).This was repeated using the RBS optimized for the phage promoter witheach other promoter giving similar results (FIG. 2 a , grey bars).Although recent published attempts to express detectable levels of GFPin Bacteroides have been unsuccessful, the use of both the A/T-rich RBSlibrary and the phage promoter exhibited strong GFP expression from asingle genomic integration that can be easily detected by eye under UVlight (FIG. 7 ).

Characterizing the phage promoter Assessed next was how reliably thephage promoter functions in the gut and whether high protein expressionresults in a loss of Bt fitness. In culture, a 50:50 mix of the high GFPexpression (P_(BfP1E6)) Bt strain and a non-expressing control Bt strainshowed no significant difference in relative abundance after foursuccessive growth cycles (FIG. 8 ). Next, a 50:50 mix of the two strainswas inoculated into germ-free mice (n=5), to assess the fitness burdenof high, constitutive protein expression in vivo. No difference inabundance between the strains was observed over the course of 14 days,with a small reduction from 50% to 35% during the next eight weeks (FIG.2 b ). Imaging of the distal colon at day 71 post-colonization revealeda strong endogenous GFP fluorescence signal in ˜37% of the Bt (FIG. 2Cand FIG. 9 ). Achieving high expression with this minimal fitness burdenenables a wide range of novel applications, including detection ofreporter expression with in vivo imaging. To understand transcriptionalvariability of the phage promoter under different in vivo and in vitroconditions, transcript levels were measured at different growth phasesin culture and from different locations in gnotobiotic mice.Transcripts, measured via qPCR, from P_(BfP1E6) were relatively similarin all gut locations and culture conditions tested with less than a4-fold maximum difference, while PrRNA transcripts decreased more than40-fold between mid-log and late-log growth phases (FIG. 10 ).

To characterize how changes in the phage promoter sequence influenceexpression levels, Bt strains were constructed that each expressed GFPwith a single mutation in the promoter, for 94% of all possiblemutations in the 76 residues upstream of the transcription start site(FIG. 2 d ). Of the 214 strains constructed, no single mutationsignificantly increased expression, suggesting that native sequenceachieves a local optimum for expression. Based on previous literature,mutations in the residues between −4 and −54, particularly in the −7 and−33 regions (FIG. 2 d highlighted in blue), were expected to mostinfluence promoter activity. However, the −33 position was far lessimportant than expected, and previously uncharacterized sequences at −49to −53 and −60 to −64 (FIG. 2 d highlighted in red) were important forpromoter activity. Consistent with these data, the −51 region appears tobe more highly conserved in native Bt promoters than the −33 region(FIG. 11 a ). The region upstream of the −33 is expected to contain theUP-element, which remains to be characterized in the Bacteroidetesphylum. The spacing of the GTTAA motifs within these two newlyidentified regions is consistent with the proximal (˜−42) and distal(˜−52) UP-elements of E. coli, but shifted in location by approximately10 nucleotides. Table 5 depicts results from the above experiments.

TABLE 5 Strength of various tested promoter sequences (listed mutantsare relative to the wild type sequence set forth in SEQ ID NO: 150).Mutant Strength SEQ ID NO WT 1.00 150 T-76A 0.92 151 T-76C 0.81 152T-76G 0.86 153 T-75A 0.91 154 T-75C 0.94 155 T-75G 0.81 156 G-74A 0.86157 G-74C 0.87 158 G-74T 0.88 159 T-73C 0.80 160 T-73G 0.82 161 T-72A0.84 162 T-72C 0.86 163 T-72G 0.80 164 T-71A 0.90 165 T-71C 0.85 166T-71G 0.82 167 G-70A 0.89 168 G-70C 0.80 169 G-70T 0.92 170 C-69A 0.92171 C-69G 0.75 172 C-69T 0.88 173 A-68C 0.78 174 A-68G 0.88 175 A-68T0.81 176 A-67C 0.84 177 A-67G 0.85 178 A-67T 0.89 179 T-66A 0.93 180T-66C 0.81 181 T-66G 0.84 182 G-65A 0.87 183 G-65C 1.00 184 G-65T 0.93185 G-64A 0.56 186 G-64T 0.61 187 T-63C 0.48 188 T-63G 0.37 189 T-62A0.53 190 T-62G 0.55 191 A-61C 0.35 192 A-61G 0.33 193 A-61T 0.44 194A-60C 0.56 195 A-60G 0.41 196 A-60T 0.64 197 T-59A 0.95 198 T-59C 0.80199 T-59G 0.82 200 C-58A 0.90 201 C-58G 0.87 202 C-58T 0.85 203 T-57A0.85 204 T-57C 0.91 205 T-57G 0.96 206 A-56C 0.93 207 A-56G 0.81 208A-56T 0.94 209 T-55A 0.93 210 T-55C 0.89 211 T-55G 0.87 212 T-54A 0.93213 T-54C 0.85 214 T-54G 0.85 215 G-53A 0.61 216 G-53C 0.68 217 G-53T0.75 218 T-52A 0.59 219 T-52C 0.53 220 T-51A 0.74 221 T-51C 0.76 222T-51G 0.52 223 A-50G 0.22 224 A-50T 0.31 225 A-49C 0.10 226 A-49G 0.24227 A-49T 0.34 228 A-48C 0.73 229 A-48G 0.72 230 A-48T 0.79 231 A-47C0.90 232 A-47G 0.97 233 A-47T 0.96 234 T-46A 0.94 235 T-46C 0.94 236T-46G 0.91 237 T-45A 0.99 238 T-45C 0.95 239 T-45G 0.92 240 T-44A 1.05241 T-44G 0.99 242 A-43C 0.94 243 A-43G 0.88 244 A-43T 0.85 245 A-42G0.97 246 A-42T 0.86 247 A-41C 0.65 248 A-41G 0.88 249 A-41T 0.95 250G-40A 0.99 251 G-40C 0.96 252 G-40T 0.87 253 T-39A 0.99 254 T-39C 0.93255 T-39G 0.90 256 T-38A 0.87 257 T-38C 0.85 258 T-38G 0.88 259 T-37A0.93 260 T-37G 0.91 261 C-36A 0.91 262 C-36G 0.98 263 C-36T 0.91 264A-35G 0.97 265 A-35T 0.91 266 C-34A 0.78 267 C-34G 0.92 268 C-34T 0.91269 T-33A 0.78 270 T-33C 0.82 271 T-33G 0.87 272 T-32A 0.97 273 T-32C0.90 274 T-32G 0.93 275 G-31A 0.93 276 G-31C 0.80 277 G-31T 0.89 278A-30C 0.94 279 A-30G 0.92 280 A-30T 0.94 281 A-29C 0.91 282 A-29G 0.93283 A-29T 0.94 284 C-28A 0.95 285 C-28G 0.97 286 C-28T 0.91 287 T-27A0.96 288 T-27C 0.80 289 T-27G 0.88 290 T-26A 0.93 291 T-26C 0.86 292T-26G 0.88 293 T-25A 0.92 294 T-25C 0.97 295 T-25G 0.82 296 C-24A 0.91297 C-24G 0.90 298 C-24T 0.80 299 A-23C 0.94 300 A-23G 1.00 301 A-23T0.92 302 A-22C 0.86 303 A-22G 0.89 304 A-22T 0.88 305 A-21C 0.84 306A-21G 0.88 307 A-21T 1.03 308 T-20A 1.00 309 T-20G 0.90 310 A-19C 0.83311 A-19G 0.83 312 A-19T 0.94 313 A-18C 0.98 314 A-18G 0.99 315 A-18T0.98 316 T-17A 0.91 317 T-17C 0.89 318 T-17G 0.83 319 G-16A 0.87 320G-16C 1.03 321 G-16T 0.81 322 T-15A 0.88 323 T-15C 0.81 324 T-15G 0.95325 T-14A 0.95 326 T-14C 0.99 327 T-14G 1.08 328 C-13A 0.92 329 C-13G0.94 330 C-13T 0.94 331 T-12A 0.90 332 T-12C 0.97 333 T-12G 0.97 334T-11A 0.26 335 T-11C 0.71 336 T-11G 0.69 337 A-10C 0.02 338 A-10G 0.00339 A-10T 0.54 340 T-9A 0.95 341 T-9C 0.92 342 T-9G 0.80 343 A-8C 0.93344 A-8G 0.81 345 A-8T 0.92 346 T-7A 0.51 347 T-7C 0.41 348 T-6A 0.08349 T-6C 0.02 350 T-6G 0.00 351 T-5A 0.37 352 T-5G 0.24 353 G-4A 0.40354 G-4C 0.05 355 G-4T 0.05 356 C-3G 1.07 357 C-3T 0.77 358 A-2C 0.93359 A-2G 0.92 360 A-2T 0.80 361 G-1A 0.91 362 G-1C 0.97 363 G-1T 1.04364Heterologous Transcription by the Phage Promoter Exceeds LevelsObtainable by the Strong Native rRNA Promoter

Bacteroides harboring a cassette for expressing GFP driven by either thephage promoter (SEQ ID NO: 8) or the ribosomal RNA promoter (SEQ ID NO:511,ggctacttttgcacccgctttccaagagaagaaagccttgataaattgacttagtgtaaaagcaagtgtctgcttaaccataagaacaaaaaaacttccgataaagtttggaagataaagctaaaagttcttatctttgcagtccgattcgcaaagaaaaggtgttacgcttttcttctttaccttctttccctttcgctaagagagcctgaaaaacgatagaaaaagaaaaacgaaaaaaaaacttccgaaaatatttggtagttaaaataaaacctcttacctttgcacccgcttttaaaacgaaagcaagatgttctttgaaatattgataaacaatacaagtagtacaagaaaaaaatagaaccgtcaatacttgtcttatatgtagtaatatgtatgagtcataaggtattaatgaagtcaataaattgtacggcatcctgaacagagcaaaaatcagctttatgctgactaacaatacttttacaatgaagagtttgatcctggctcag)were grown in vitro and in vivo as described herein. To compare thestrength of the phage promoter to the ribosomal RNA promoter, a nativepromoter that is expected to be among the most highly expressed nativepromoters, transcription rates of each promoter were determined viaRT-qPCR as described herein. In all measured gut locations and insaturated culture conditions, transcripts produced from the phagepromoter significantly exceeded those produced from the ribosomal RNApromoter (FIG. 10 ).

Heterologous Protein Driven by the Phage Promoter Exceeds LevelsAchieved with the Strongest Native Promoters by Ten-Fold

To achieve high levels of protein expression, a strong RBS was used inaddition to using a strong promoter. Strong RBSs were generated fromscreening an RBS library with a motif based on the RBSs found in aBacteroides specific phage (SEQ ID NO: 375). Previously, expression offluorescent proteins from Bacteroides has not been reported, however,use of the RBS library (SEQ ID NO: 375) increased expression from therRNA promoter to 38% higher than background autofluorescence ofunmodified cells when measured as described herein. Screening a numberof additional native promoters produced higher expression, including upto a 950% increase in fluorescence relative to the autofluorescence ofunmodified cells. Fluorescence from the GFP driven by the phage promoterhowever exceed that produced by any of the strong native promoterstested by ten-fold with an approximately 9500% increase in fluorescencerelative to the autofluorescence of unmodified cells.

Heterologous Protein Expression from the Phage Promoter ProducesApproximately 14,000 nM of Cytoplasmic Protein

To determine the absolute protein expression level achievable with thephage promoter, a standard curve was generated with purified luciferaseprotein of a known concentration and compared to luciferase driven bythe phage promoters and several variants (SEQ ID NO: 1-8), as describedherein. The protein concentrations from these constructs range fromapproximately 0.5 to 14,000 nM. Since the phage promoter isapproximately ten times the strength of any measured native promoter,cytoplasmic protein concentrations of 1,400 nM or less are expected tobe achievable by native promoters.

Generating Expression Predictably Functioning Promoter Variants

Using data from the mutational analysis a set of eight constitutivepromoters were created that span a 30,000-fold expression range byintroducing single or multiple mutations in P_(BfP1E6) (FIG. 2 e ). As acomplementary means of controlling expression levels, eight RBSsspanning more than 5 orders of magnitude were also generated (FIG. 2 f). As protein expression level is the product of promoter and RBSstrength, in combination these promoters and RBSs give a theoreticalexpression range of ten billion, well beyond the range of highlysensitive assays. The eight constitutive promoters in this set differ byonly a few residues upstream of transcription initiation and thus areexpected to function predictably when driving different protein-RBScombinations. Because core transcriptional and translational machineryis highly conserved, these expression tools should function predictablyacross the entire Bacteroides genus.

56 promoter-RBS combinations were constructed (promoters of SEQ ID NOs:2-8, in combination with RBSs of SEQ ID NOs: 11-18, in all pairwisecombinations, e.g., see Table 4 above) and luciferase expression ofgenomically integrated constructs in four species, Bt, Bv, Bo and Bu,was measured to determine the extent of predictable expression. Theexpected expression level for the >200 strains was calculated bymultiplying the relative promoter and RBS strengths determined in Bt(FIG. 2E-2F). A high correlation was found between expected and measuredexpression over a million-fold range in all four species with R² rangingfrom 0.95 in Bt to 0.89 in Bv (FIG. 3A). Additionally, the promotersproduce the expected relative levels of GFP in Bt, Bv, Bo, Bf, andBacteroides eggerthii (Be) (FIG. 12 ).

Endogenous Fluorescent Imaging in the Mouse Gut

Six different Bacteroides species were engineered using the above panelof promoters to produce a unique fluorescent signature that could beimaged in vivo. One of three levels of GFP expression plus one of twolevels of mCherry expression were genomically integrated into eachspecies. Strain level differentiation in mixed communities, which isdifficult using established methods such as fluorescence in situhybridization (FISH), was achieved at the single cell level (FIG. 3B).Either the full set of six engineered species or a subset of threespecies were next introduced into germ-free mice. After 14 days ofcolonization, mice were sacrificed, distal colon sections were imagedand single-cell fluorescent profiles were quantified (FIG. 3C and FIG.13 and FIG. 14 ). Comparison of the six-species and three-speciescommunities indicated a low cell identification error (˜6%) in the sixmember community (FIG. 14 ). Transformation of fluorescent signaturesenabled visual differentiation of six co-residing Bacteroides specieswithin the gut (FIG. 3D-E). Bacteroides species differentially localizedin dietary plant material within the gut at one-day post colonization(FIG. 3F), demonstrating the utility of fluorescent-protein-expressingspecies along with conventional staining methods in detailedinvestigations of spatial and temporal microbiota dynamics.

Materials and Methods

High throughput plasmid construction, conjugation and integration. Basicpart plasmids were created by cloning each part, flanked with the BsaIrestriction site and 4-base overhangs specified in FIG. 4 , into astandard cloning vector, pWW3056, using NotI/SbfI restriction sites. SeeFIG. 15 and FIG. 16 (Tables 2 and 3) for a list of oligonucleotides,basic part plasmids, and their corresponding sequences, respectively.Golden Gate reactions were carried out according to standard procedures,using any combination of basic part plasmids above, synthesizedsequences, PCR products, or PNK-treated annealed oligonucleotides(annealed to generate BsaI digestion equivalent overhangs). CompletedGolden Gate reactions of 4 μL were transformed with addition of 20 μL ofchemically competent E. coli S17-1 cells (mid-log cells resuspended 1:20in TSS/KCM: LB medium with 8.3% PEG-3350, 4.2% DMSO, 58 mM MgCl₂, 167 mMCaCl₂ and 457 mM KCl), followed by a 90 second heat shock at 42° C.,recovery at 37° C. for 30 minutes, a dilution into 600 μL LB medium withAmpicillin (150 μg/mL) in a deep well 96-well plate (Corning 07-200-700)and aerobic growth at 37° C. A Bacteroides culture was prepared withovernight anaerobic growth in trypticase yeast extract-glucose (TYG)growth medium. At mid to late log growth, 200 μL of the transformedS17-1 cells were spun down, resuspended with 10 μL of a 1:10concentration of the Bacteroides culture, and added to a deep well96-well plate containing 400 μL of solidified Brain Heart Infusion BloodAgar (BHI-BA) per well. After at least 16 hours, the lawn of S17-1 andBacteroides were resuspended in 400 μL of TYG by vortex or pipetting,200 μL of the resuspension was spun down and resuspended in 15 μL TYGand several dilutions in TYG were made. 3 μL of the resuspension and itsdilutions were spotted onto a 120×120 mm square petri dish containingBHI-BA plus the appropriate antibiotics (200 μg/mL gentamycin, and 25μg/mL erythromycin or 2 μg/mL tetracycline). Of the species tested here,Bf produces the fewest and Bv produces the most transformants.Bacteroides colonies can be picked after a 24 hour anaerobic incubationat 37° C.

Assessing high-throughput cloning and genomic integration pipelinesuccess rates. The likelihood of obtaining a colony with a correctlyassembled, integrated plasmid was extracted from phenotypic data (FIG. 3a ). The 40 constructs that produce within 10,000-fold of the maximumexpression were considered for each of the four species. Four biologicalreplicate Bacteroides colonies were picked for each construct withineach species, and each was expected to be derived from a uniquelygenerated plasmid since conjugation to E. coli transformants wasperformed in batch. Replicates with a deviation from the median by atleast an order of magnitude were considered to be incorrectly assembled.All such misassembles were at least 50-fold lower than expected andclose to background levels of luminescence. Samples with substantiallylower growth as determined by OD_(600nm) were excluded from theanalysis, although inclusion of wells with little or no growth onlysubstantially impacted Bv calculations with a reduction to 90% correct.

Culture reporter expression and fluorescent assays. To assay Bacteroidesstrain reporter activity, glycerol stocks of Bacteroides strains werestreaked out on BHI blood agar plates with the appropriate antibiotics(200 μg/mL gentamycin, and 25 μg/mL erythromycin or 2 μg/mLtetracycline), and after a 24-30 hour anaerobic incubation at 37° C., atleast 3 colonies were picked into TYG with antibiotics (25 μg/mLerythromycin or 2 μg/mL tetracycline) and grown anaerobically at 37° C.for 14-20 hours. Endogenous fluorescence from super-folding GFP andmCherry was measured after twice spinning cultures down and resuspendingin PBS followed by oxygen exposure for at least 60 minutes. The Nano-GloLuciferase Assay System (Promega) was used for luciferase assays.Fluorescence, OD₆₀₀ and luminescence readings were taken on a TECANInfinite 200 PRO microplate reader with a 5 nm band passexcitation/emission of 488/510 and 580/610 nm for GFP and mCherryrespectively.

Gnotobiotic mouse experiments. Mouse experiments in this study wereperformed in strict accordance with a Protocol for Care and Use ofLaboratory Animals approved by the Stanford University AdministrativePanel of Laboratory Animal Care. Germ-free Swiss Webster mice (Taconic)were maintained in gnotobiotic isolators on a 12 hour light cycle andfed ad libitum a standard autoclaved chow diet (LabDiet 5K67). Mice wereinoculated via oral gavage with ˜10⁸ total Bacteroides CFU, either asingle strain or equal proportions of mixed strains. Fecal pellets wereplated on BHI-BA with gentamycin and erythromycin, grown at least 24hours, and individual colonies were picked for fluorescent assay basedenumeration. After one day (FIG. 3 f ), 2 weeks (FIG. 3 d-h ) or 10weeks (FIG. 2 c ) mice were sacrificed using CO₂ asphyxiation andcervical dislocation in accordance with approved protocols and tissuewas immediately harvested and processed as described below.

Fitness assays. Culture fitness assays were conducted by streaking outglycerol stocks of GFP expressing or non-expressing Bt, picking twocolonies of each and growing in TYG+erythromycin (25 μg/mL) overnight,subculturing each strain and growing to mid-log, and then independentlycombining the two sets of cultures at a 1:1 ratio followed by growth tostationary phase. Each day for 4 days the cultures were subsequentlydiluted 1:100 for overnight growth, then diluted 1:50 and sampled atmid-log during growth to stationary phase. At each mid-log timepoint,the cultures were sampled, centrifuged at 14,000×g, resuspended in anequal volume of PBS, and assayed for bulk GFP fluorescence relative topurely GFP-expressing or non-expressing cultures. In vivo fitnessexperiments were conducted by similarly preparing a mix of the twostrains from overnight culture, and inoculating and maintaining mice asdescribed above. Bacterial densities were determined using serialdilution of samples taken from fecal pellets of each mouse three times aweek. Forty-eight colonies for each mouse at each timepoint were pickedand assayed for fluorescence as described above and weekly data wasaveraged for each mouse to provide an average proportion of GFPexpressing Bt for each mouse each week.

Transcript measurements. RNA was isolated with RNeasy kits (Qiagen)applied to either cecal or fecal contents treated with phenol-chloroformand bead beating, or cultures were treated with RNAprotect (Qiagen) andlysozyme as previously described. RNA was converted to cDNA withSuperscript II (Invitrogen) followed by qRT-PCR analysis with SYBR Green(ABgene) in an MX3000P thermocycler (Stratagene). The normalizedtranscript levels, GFP/16S, were determined by amplification of GFP and16S, with primers tggtgttcagtgctttgctc (SEQ ID NO:376)/agctcaatgcggtttaccag (SEQ ID NO: 377) and cgttccattaggcagttggt (SEQID NO: 378)/caacccatagggcagtcatc (SEQ ID NO: 379) respectively.

Mutational analysis of phage promoter. For each promoter variantassayed, a unique strain was generated, as described above, using athree-piece Golden Gate assembly of a pair of PNK treated annealedoligonucleotides manufactured by Integrated DNA Technologies (typicallytwo 28-base-pair oligonucleotides) containing the specific mutation,combined with upstream and downstream plasmid parts to create anexpression plasmid identical to pWW3452 but with a single promotermutation. The assembly and integration process was repeated threeindependent times to better identify outliers in expression due toerrors in plasmid synthesis. All strains producing less than 75% thenative promoter activity were sequence verified with PCR from genomicDNA and Sanger sequencing. 98% of the verified mutations outside of thehighlighted regions of importance (FIG. 2 d ) produced over 75% ofP_(BfP1E6)-driven fluorescence.

Absolute luciferase expression quantification. A standard curve forquantifying luciferase concentration was produced using purifiedluciferase protein (Promega; NanoLuc-Halotag Protein, 100 μg; Item #:CS188401). The luciferase protein (8 μg/μl; 54.2 kDa) was diluted either1:2,000 or 1:20,000 into PBS+BSA, and serially diluted (1:4) in PBS+BSA.Luminescence was measured with Nano-Glo Luciferase Assay System(Promega), and dilutions of between 8×10³ and 8.2×10⁷ produced readingwithin the linear range (FIG. 11 b ). Simultaneously, cultures weregrown in triplicate and similarly assayed for luciferase as describedabove, as well as plated giving on average 5×10⁶ CFU/μl. Cells harboringthe strongest phage promoter when diluted 1:400 produced luminescencecorresponding to 10 μg/μl (0.18 nM) of purified protein. Assuming anintracellular volume of approximately 1 μm³, corresponding to anintracellular volume of 0.5% of the culture volume, the intracellularconcentration of luciferase is expected to be approximately 14 μM(calculated as: 0.18 nM×400/0.5%). Concentrations for the strainsharboring the other seven promoters was similarly calculated and plottedin FIG. 11 c.

Tissue preparation and microscopy. Harvested tissues were immediatelytransferred to a 4% paraformaldehyde solution in PBS for a 48 hourfixation. Samples were then embedded in O.C.T. Compound (Tissue-Tek) andsectioned to either 4 μm (FIG. 2 c ) or 100 μm thickness (FIG. 3 d-f )on a Leica CM3050 S cryostat. 4 μm sections were fully dried; 100 μmsections were immediately processed without drying. All samples werestained for 45 minutes with 4′,6-Diamidino-2-phenylindoledihydrochloride (DAPI; Sigma-Aldrich) and Alexa Fluor 594 Phalloidin(Life Technologies), and 100 μm sections were also stained withFluorescein labeled Ulex Europaeus Agglutinin I (UEAI; VectorLaboratories), followed by a PBS wash and mounting in VECTASHIELD(Vector Laboratories). Images were taken on a Zeiss LSM 700 confocalmicroscope using lambda mode to obtain independent spectral profiles foreach of the 488 nm, 555 nm and 639 nm lasers.

Image processing and transformation. Linear unmixing was applied to eachspectral profile independently to separate the following channels: DAPI,GFP, UEAI, mCherry, and Phalloidin for FIG. 2 c-e , and DAPI, GFP,mCherry and autofluorescent plant material for FIG. 2 f . Lineardeconvolution was applied (ImageJ plugin Diffraction PSF 3D by BobDougherty) to all channels except UEAI and plant material, and thedefault ImageJ despeckling plugin was applied. To generate the singlecell expression profiles (FIG. 3 b-c ), the deconvolved DAPI image wasthresholded, a mask was generated for lumen-side objects ofapproximately bacteria size (0.1 to 1 μm²), and a watershed algorithmwas applied to help separate contacting cells. Then the average GFP andmCherry value was determined for each object (single bacteria cell) andplotted with Matlab. To visually distinguish log-separated GFP values,thresholds were chosen based on the GFP/mCherry single-cell fluorescentprofiles, to transform the following GFP/mCherry categories to uniquecolors: low/low=blue; medium/low=cyan; high/low=green; low/high=red;medium/high=orange; high/high=yellow. Additionally, to better visualizeambiguity in category calls, values within 1.75-fold and 6-fold of theGFP and RFP thresholds, respectively, are colored grey. Each pixel wasindependently transformed to the value determined by the GFP/mCherrycategory, multiplied by the DAPI value, and overlaid with the UEAI andPhalloidin channels (FIG. 2 d-e ) or plant material (FIG. 2 f ). Cellscontaining more than 25% pixels of another category or near thresholdvalues (grey pixels) are considered to be ambiguous calls.

Example 2: Promoter Tests

Assays were performed to test the ability of various sequences tofunction as promoters in Bacteroides cells (see Table 6 and Table 7 forresults).

TABLE 6 Promoter activity assay. Promoter “P6” refers to the phagepromoter identified in Example 1 above, and SEQ ID NOs:388-394 are various truncated versions of the promotersequence of SEQ ID NO: 8. The underlined nucleotides arethose that are added relative to the sequence of SEQ ID NO:399. P5 is a different phage promoter sequence identifiedduring the experiments described in Example 1 above. SEQ Avg 95% LengthID Promoter Activity CI (nt) Sequence NO blank cells 1.0 0.1 0P6(−36,+1) 1.1 0.0 37 cacttgaactttcaaataatgttcttatatttgcagt 399P6(−54,+1) 6.2 1.3 55 tgttaaaatttaaagtttcacttgaactttcaaataatgttctt 389atatttgcagt P6(−56,+1) 4.7 0.4 57attgttaaaatttaaagtttcacttgaactttcaaataatgttc 390 ttatatttgcagtP6(−46,+17) 6.2 0.1 63 tttaaagtttcacttgaactttcaaataatgttcttatatttgc 391agtgtcgaaagaaacaaag P6(−56,+17) 5.8 0.3 73attgttaaaatttaaagtttcacttgaactttcaaataatgttc 392ttatatttgcagtgtcgaaagaaacaaag P6(−74,+1) 6.5 0.2 75gtttgcaatggttaatctattgttaaaatttaaagtttcacttg 393aactttcaaataatgttcttatatttgcagt P6(−74,+17) 16.6 1.2 91gtttgcaatggttaatctattgttaaaatttaaagtttcacttg 388aactttcaaataatgttcttatatttgcagtgtcgaaagaaaca aag P6(−93,+20) 8.8 0.2 114gactaccttttttttgttttgtttqcaatggttaatctattgtt 394aaaatttaaagtttcacttgaactttcaaataatgttcttatat ttgcagtgtcgaaagaaacaaagtagP5(−54,+1) 4.3 0.1 55 agttaatgcacgttaaagtatttgctactgagaaatatatccgt 405atatttgcagt P5(−93,+20) 8.7 0.3 114gagtaactacgataataaagtgataattcaatgttaaaacagtt 406aatgcacgttaaagtatttgctactgagaaatatatccgtatat ttgcagcgtagaagttattactaacgP5(−74,+17) Not 91 tgataattcaatgttaaaacagttaatgcacgttaaagtatttg 407tested ctactgagaaatatatccgtatatttgcagcgtagaagttatta cta

TABLE 7 Promoter activity assay. Promoter “P6” refers to the phagepromoter identified in Example 1, and SEQ ID NOs: 395-397are various truncated versions of the promoter sequence ofSEQ ID NO: 8. The underlined nucleotides are those that areadded relative to the sequence of SEQ ID NO: 399 (see Table 6). SEQ Avg95% Length ID Promoter Activity CI (nt) Sequence NO blank cells 1.0 0.00 P6(−40,+20) 3.4 1.0 60 gtttcacttgaactttcaaataatgttcttatatttgc 395agtgtcgaaagaaacaaagtag P6(−60,+20) 20.8 4.7 80atctattgttaaaatttaaagtttcacttgaactttca 396aataatgttcttatatttgcagtgtcgaaagaaacaaa gtag P6(−80,+20) 50.8 8.6 100tgttttgtttgcaatggttaatctattgttaaaattta 397aagtttcacttgaactttcaaataatgttcttatattt gcagtgtcgaaagaaacaaagtagP6(−100,+20) 57.8 2.6 120 caattgggctaccttttttttgttttgtttgcaatggt 8taatctattgttaaaatttaaagtttcacttgaactttcaaataatgttcttatatttgcagtgtcgaaagaaaca aagtag Note: the results of Table7 are not directly comparable to those of Table 6. Thus, directcomparisons can be made within each table, but not across tables.

Example 3: Promoter Function in Multiple Different Cells

FIG. 20 demonstrates that a subject promoter that is operable inBacteroides cells can also be operable in other types of prokaryoticcells (e.g., an E. coli cells). Thus, in some cases, a subject promoter,in addition to being operable in Bacteroides cells, is also operable innon-Bacteroides cells (e.g., prokaryotic cells such as E. coli cells).FIG. 20 depicts E. coli cells expressing a GFP transgene that isoperably linked to the promoter of SEQ ID NO: 388 (which is demonstratedherein to be operable in Bacteroides cells, and also in E. coli cells).

Example 4: Cleavable Linkers Tested for Secreted Fusion Proteins

To develop a peptide secretion strategy, proteins were identified thatfunction across the Bacteroides genus to secrete tethered peptides.Peptides tethered by linkers designed to be cleaved by gut proteaseswere cleanly released.

Results Peptide Secretion Strategy

In addition to the high-throughput strain modification and strong,predictable protein expression methods developed here, it was desired tofurther expand the repertoire of tools available for engineeringgut-resident prokaryotic species (e.g., Bacteroides species). Reliablemeans of heterologous protein secretion in gram-negative bacteria arelacking, and previously described signal sequence were unable to directproteins of interest outside of cells. In order to take advantage ofnative protein secretion in the Bacteroides, a mass spectrometry-basedproteomics assay was performed to determine natural secreted proteinsfrom B. thetaiotaomicron (FIG. 17 ). Multiple candidate-secretedproteins were cloned under strong constitutive expression, using nativeRBSs, with a C-terminal triple FLAG tag and tested for soluble secretioninto the media. Many proteins were identified to be secreted via outermembrane vesicle (OMVs), some having been identified in a recentlypublished study on Bacteroides OMVs, and one candidate (product ofhypothetical ORF BT_0525) was identified to be secreted as a solubleprotein into the cell culture medium using a carefully designed Westernblot technique to account for cell lysis when analyzing proteinsecretion. To develop BT0525 (SEQ ID NO: 459) as a generalizable toolfor protein secretion in the Bacteroides, secretion of a protein fromthe six Bacteroides species used above (which, as described above, wereused to test variations of P_(BfP1E6)) was attempted. The same strong,constitutively expressed and FLAG-tagged version of BT0525 that was usedto confirm soluble secretion in B. theta, was chromosomally insertedinto the other six species. Translatability of secretion of BT0525 intothe culture supernatant across divergent members of the Bacteroidesgenus was demonstrated (FIG. 17 b ). Using this broadly applicablecarrier protein, a system was designed to deliver peptide cargo fromBacteroides cells into the gut milieu. Because the gastrointestinaltract is rich with proteases, linkers were used to connect the peptidecargo to the carrier protein with motifs that could be targeted bycommon gut proteases (FIG. 17 c ). It was next experimentallydemonstrated that B. thetaiotaomicron grown in vitro can secrete a 30amino acid 6×His/3×FLAG tag (HHHHHH-GG-DYKDHDG-DYKDHDI-DYKDDDDK) (SEQ IDNO: 410) cargo peptide, and that the cargo was released upon treatmentwith extract from murine cecal contents (FIG. 17 c ). When the linkerwas mutated at the predicted amino acid cleavage site, the peptide cargois no longer released upon treatment with cecal extract.

FIG. 22 . Bt secretes proteins via OMVs. When secreted proteincandidates were cloned under constitutive expression with a 3×FLAG tagand cell pellet (P), cell-free culture supernatant (S), ultracentrifugedS to remove OMVs (U), and recovered OMVs (O) were analyzed via westernblot, protein products of BT1488 and BT3742 localized to OMVs (presenceof BT3742 in the ultracentrifuged supernatant is accounted for by lysis)while BT0525 localized mainly to the cell-free supernatant.

FIG. 23 Diverse species of Bacteroides secrete BT0525. Western blotanalysis of Bv, Bu, and Be strains expressing sfGFP and BT0525, eachunder P_(BfP1E6) and with a 3×FLAG tag. Cell pellets show expression ofboth proteins, while culture supernatants demonstrate secretion ofBT0525 independent of lysis. These three species of Bacteroides are ableto accumulate more BT0525 signal in the supernatant than Bt, Bf, or Bofor unknown reasons. This could be due to differential expression ofsecretion machinery, degradation machinery in the periplasm or at thecell membrane, or of proteases that are released extracellularly.

Materials and Methods

Secreted protein proteomics. Wild-type Bt was grown in 150 mL Salyer'sMinimal Media+glucose in triplicate, anaerobically at 37° C. to mid-log.Cultures were centrifuged at 2700 g for 20 minutes to pellet the cells.Culture supernatant was then filter sterilized with a 0.2 μm filter(Corning), concentrated 300× with 10 k Centriprep centrifugalconcentrator tubes (Millipore), and buffer exchanged into 50 mM Tris ata pH of 8. A 1 mL aliquot of cell pellet was resuspended in 1 mL urealysis buffer+protease inhibitor (Roche). Cell pellet and culturesupernatant were each run on an SDS-PAGE gel and stained with Coomassieto visualize protein banding patterns in each fraction. The same sampleswere then analyzed by GC-MS [more info here on how Josh did this andanalyzed the data?] and reads were mapped back to the Bt proteindatabase and identified by predicted ORF. The average reads in the cellpellet and culture supernatant for individual proteins found in two ofthe three replicates were plotted with standard deviation to visualizerepresentation in each cell fraction.

Western blot analysis of secreted proteins. To differentiate betweenprotein in the cell culture supernatant due to active secretion ascompared to cell lysis, a control Bt strain expressing genomicallyintegrated 3×FLAG-tagged superfolder GFP (which folds too efficiently tobe secreted, allowing GFP signal in the supernatant to act as a proxyfor cell lysis) was developed. Candidate proteins of interest were thencloned under PBfP1E6 and their native RBS with a C-terminal 3×FLAG tag,and genomically integrated into the lysis control strain. For testingsecretion of BT0525 in diverse Bacteroides species, the GFP lysiscontrol plasmid was subcloned into the BT0525 expression plasmid viaBamHI/XbaI and BgIII/SpeI sites and the resulting construct wasgenomically integrated into Bt, Bf, Bv, Bu, Bo, and Be, as Be appearedunable to accept two separate plasmids. All strains tested for proteinsecretion were grown to mid-log in either Salyer's Minimal Media+glucoseor in TYG. Cultures were centrifuged at 8000 g in a tabletop centrifugefor 10 minutes, culture supernatant was harvested, and cell pellet wasresuspended in PBS at the original volume. To test for secretion viaOMVs, culture supernatants were filter sterilized with a 0.2 μm filter(Corning), 44 mL were centrifuged in a 70Ti rotor in a Beckman CoulterOptima L-90K ultracentrifuge at 37 k rpm and 4° C. for 2 hours, washedin PBS, and OMV pellets were resuspended in 1 mL PBS. Cell pelletfractions were diluted 1:20 in PBS to achieve linear-range visualizationon the western blot, and run with undiluted supernatant samples onSDS-PAGE gels. Samples were blotted onto nitrocellulose membranes usingthe iBlot dry transfer system (Life Technologies), and stained with ananti-FLAG HRP-conjugated antibody (Sigma).

Peptide release via cleavable linkers. Strains of Bt expressing BT0525linked to a 6×His-3×FLAG tag via designed linkers were grown overnightin TYG. Cultures were centrifuged at 8000 g for 10 minutes, andsupernatant was harvested. Supernatant was exposed to either PBS orincreasing concentrations of cecal extract (liquid fraction ofcentrifuged murine cecal contents from conventional mice) for 10 minutesat 37° C. Digestion was immediately stopped by addition of reducingSDS-PAGE sample buffer and heat treatment at 70° C. for 10 minutes.Samples were analyzed via western blotting as described above.

Table 11 provides data from testing a number of cleavable linkerspositioned between a polypeptide of interest and a secreted Bacteroidesprotein (BT0525) (SEQ ID NO: 459). A nucleotide sequence of interestencoding the fusion protein (the secreted Bacteroides protein fused tothe polypeptide of interest) was place under the control of a subjectpromoter (operable in Bacteroides cells) and the nucleic acid wasintegrated in the genome of a Bacteroides cell. The secreted fusionprotein was then collected and assayed to determine whether the linkerwas cleaved.

TABLE 11 The cleavable linkers of Table 10 were tested fortheir ability to function. Amino acid sequence Cleavage (cleavageobserved at bold Target Secretion by gut SEQ ID Linkers: amino acid)peptidase detected contents? NO CL0 GSGSSGGS Control High No 420(no cleavage expected) CL1 SGPTGHGR Trypsin Moderate Yes 422 CL2SGPTGMAR Trypsin Weak Yes 423 CL3 SGPTASPL Chymotrypsin High Yes 424 CL4SGPTTAPF Chymotrypsin B High Yes 425 CL5 SGPTAAPA Elastase 1 High Yes426 CL4x SGPTTAPG Control High No 421 (no cleavage expected)

Example 5: Polypeptides of Interest are Assayed for their Ability toTreat Colitis in Mice

The data presented herein show that combining these tools, twoanti-inflammatory peptides were successfully delivered to mice withcolitis, and these delivered peptides successfully treated murinecolitis.

Results

To test the efficacy of this peptide delivery system in vivo, theability of Bt secreting BT0525 linked to anti-inflammatory peptides tooffset the effects of DSS-induced colitis in mice was examined. Malegerm free mice were colonized with a model community of threerepresentative organisms: Clostridium scindens, Edwardsiella tarda, andBacteroides vulgatus. After allowing two weeks for communityequilibration, the mice were switched to 5% DSS in the drinking water toinduce colitis. Simultaneously, Bt secreting one of threeanti-inflammatory peptides (AIP)—FpMAM-pep5 (SEQ ID NO: 412), 101.10(SEQ ID NO: 411), or KPV (SEQ ID NO: 415)—via cleavable linkage toBT0525 expressed with PBfP1E6 was administered. Weight of the mice wasmonitored for nine days, and Disease Activity Index (DAI) was measuredat sacrifice on day nine. Mice receiving either FpMAM-pep5 or 101.10lost significantly less weight than mice that did not receive treatment(FIG. 18 a ), and demonstrated significantly lower DAI scores than micethat received KPV (FIG. 18 b ). This was similarly repeated forFpMAM-pep5 delivery in conventional mice via daily oral gavage, and alsoexhibited a significant alleviation of DSS-induced weight loss. Thisdemonstrates the collection of tools developed here function in the gut,delivering enough anti-inflammatory peptides to significantly impacthost physiology.

Materials and Methods

Mouse colitis treatment experiment. Male, Germ-free Swiss Webster mice(Taconic) were orally gavaged with an equal mixture of Edwardsiellatarda, Clostridium scindens, and Bacteroides vulgatus from overnightculture. After two weeks of community equilibration, mice were switchedto 5% Dextran Sodium Sulfate (Affymetrix) in the drinking water. Theywere simultaneously orally gavaged with ˜10⁸ CFU of a 1:1:1 mix of Btexpressing an anti-inflammatory peptide linked to BT0525 via cleavablelinkers 1, 3, and 4 (SUPP), and were grouped as follows: FpMAM-pep5(n=4), 101.10 (n=3), or KPV (n=4). Mice were weighed each day for ninedays, and sacrificed on day nine. At sacrifice, stool consistency, bloodin the stool (Hemoccult SENSA, Beckman Coulter), and final weight weremeasured to calculate the Disease Activity Index. The same experimentwas performed using 5 female mice that received no treatment, as abaseline measurement of response of weight to DSS.

Table 12 provides data from testing whether various therapeutic peptidescould be used as polypeptides of interest to treat colitis in mice. Theindicated peptide was fused to a secreted Bacteroides protein (BT0525)(SEQ ID NO: 459) with a cleavage linker (cleavable by gut proteases)positioned between them. A nucleotide sequence of interest encoding thefusion protein (the secreted Bacteroides protein fused to the indicatedpeptide) was place under the control of a subject promoter (operable inBacteroides cells) and the nucleic acid was integrated into the genomeof Bacteroides cells. The Bacteroides cells were then introduced intothe guts of mice. The mice were injected with DSS (a mouse model ofcolitis) and the effect of the introduced bacteria (secreted the fusionprotein) on colitis was assayed.

TABLE 12 The peptides of Table 12 were tested for their ability toimpact DSS-induced colitis in mice. SEQ Peptide ID NO Type Significanteffect in mice 101.10 411 IL-1 inhibitory peptides Yes - reduced diseaseFp MAM-pep5 412 anti-NF-κB Yes - reduced disease CD80-CAP1 413 CD80antagonistoc peptide *Yes - negative impact, likely due to too high adose Pep2305 414 IL-23 inhibitory peptides No KPV 415 NF-kB and MAPKinhibition No WP9QY 416 anti-TNF No P144 417 TGF-b inhibitory peptide No*Various different reduced doses can be now be routinely andsystematically tested, e.g., using the promoters presented herein thathave a wide variety of strengths.

The preceding merely illustrates the principles of the invention. Itwill be appreciated that those skilled in the art will be able to devisevarious arrangements which, although not explicitly described or shownherein, embody the principles of the invention and are included withinits spirit and scope. Furthermore, all examples and conditional languagerecited herein are principally intended to aid the reader inunderstanding the principles of the invention and the conceptscontributed by the inventors to furthering the art, and are to beconstrued as being without limitation to such specifically recitedexamples and conditions. Moreover, all statements herein recitingprinciples, aspects, and embodiments of the invention as well asspecific examples thereof, are intended to encompass both structural andfunctional equivalents thereof. Additionally, it is intended that suchequivalents include both currently known equivalents and equivalentsdeveloped in the future, i.e., any elements developed that perform thesame function, regardless of structure. The scope of the presentinvention, therefore, is not intended to be limited to the exemplaryembodiments shown and described herein. Rather, the scope and spirit ofthe present invention is embodied by the appended claims.

What is claimed is:
 1. A nucleic acid, comprising: (a) a promoteroperable in a prokaryotic cell, wherein the promoter comprises anucleotide sequence comprising one or more of the following: (i) 80% ormore sequence identity of defined nucleotides of the nucleotidesequence: GTTAA (n)₄₋₇ GTTAA (n)₃₄₋₃₈ TA (n)₂ TTTG, (ii) 80% or moresequence identity with a sequence set forth in any of SEQ ID NOs: 388and 407, (iii) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA, (iv)a nucleotide sequence comprising GTTAA (n)₄₄₋₅₀ TA, (v) a nucleotidesequence comprising GTTAA (n)₄₈₋₅₄ TTTG, (vi) a nucleotide sequencecomprising GTTAA (n)₃₆₋₃₈ TA, (vii) a nucleotide sequence comprisingGTTAA (n)₄₀₋₄₂ TTTG, (viii) a nucleotide sequence comprising GTTAA(n)₃₋₇ GTTAA (n)₃₆₋₃₈ TA, (ix) a nucleotide sequence comprising GTTAA(n)₃₋₇ GTTAA (n)₄₀₋₄₂ TTTG, (x) a nucleotide sequence comprising GTTAA(n)₄₄₋₅₀ TA (n)₂ TTTG, (xi) a nucleotide sequence comprising GTTAA(n)₃₆₋₃₈ TA (n)₂ TTTG, (xii) a nucleotide sequence comprising GTTAA(n)₀₋₂₀ GTTAA (n)₁₀₋₆₀ TA (n)₀₋₁₀ TTTG, (xiii) a nucleotide sequencecomprising TTAA (n)₀₋₁₀ TTAA (n)₃₀₋₅₀ TA (n)₂ TTTG, (xiv) a nucleotidesequence comprising GTTAA (n)₄₋₇ GTTAA (n)₃₆₋₃₉ TA (n)₂ TTTGC, (xv) anucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA (n)₃₆₋₃₉ TA (n)₂ TTTG,(xvi) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA (n)₃₄₋₃₈ TA(n)₂ TTTG, (xvii) a nucleotide sequence comprising GTTAA (n)₄₋₇ GTTAA(n)₃₆₋₃₈ TA (n)₂ TTTG, (xviii) a nucleotide sequence comprising GTTAA(n)₃₋₇ GTTAA (n)₃₆₋₃₈TA (n)₂ TTTG, (xix) a nucleotide sequencecomprising GTTAA (n)₄₋₇ GTTAA (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂ TTTGC, (xx)a nucleotide sequence comprising GTTAA (n)₃₋₇ GTTAA (n)₁₂₋₁₆ TTG(n)₁₈₋₂₂ TA (n)₂ TTTG, (xxi) a nucleotide sequence comprising GTTAA(n)₄₋₈ GTTAA (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂ TTTG, and (xxii) a nucleotidesequence comprising GTTAA (n)₄₋₇ GTTAA (n)₁₂₋₁₆ TTG (n)₁₈₋₂₂ TA (n)₂TTTG, wherein each n is independently selected from A, C, G, and T; and(b) a nucleotide sequence of interest that is operably linked to thepromoter, wherein the nucleotide sequence of interest and the promoterare not found operably linked in nature.
 2. The nucleic acid of claim 1,wherein the prokaryotic cell is a Bacteroides cell.
 3. A nucleic acid,comprising: (a) a promoter operable in a Bacteroides cell, and (b) anucleotide sequence of interest that is operably linked to the promoter,wherein the nucleotide sequence of interest and the promoter are notfound operably linked in nature, wherein the promoter provides one ormore of the following when the nucleic acid is expressed in theBacteroides cell: (i) an increase in mRNA production of at least 30%relative to a native Bacteroides promoter, (ii) an increase influorescence of at least 2000% relative to autofluorescence, wherein thenucleotide sequence of interest encodes super-folding GFP, or (iii) acytoplasmic protein concentration of at least 1.5 μM, wherein thenucleotide sequence of interest encodes the protein.
 4. The nucleic acidof claim 3, wherein the native Bacteroides promoter is a nativeBacteroides rRNA promoter.
 5. The nucleic acid of claim 3, wherein theincrease in mRNA production is at least 50%.
 6. The nucleic acid ofclaim 3, wherein the increase in fluorescence is at least 5000%.
 7. Thenucleic acid of claim 3, wherein the increase in fluorescence is atleast 8000%.
 8. The nucleic acid of claim 3, wherein the cytoplasmicprotein concentration is at least 2 μM.
 9. The nucleic acid of claim 3,wherein the cytoplasmic protein concentration is at least 5 μM.
 10. Thenucleic acid of claim 3, wherein the cytoplasmic protein concentrationis at least 10 μM.
 11. The nucleic acid of claim 3, wherein the proteinis luciferase.
 12. The nucleic acid of claim 1 or 3, wherein thepromoter is a phage promoter or a functional fragment thereof.
 13. Thenucleic acid of claim 12, wherein the phage is ϕB124-14.
 14. The nucleicacid of claim 1 or 3, wherein the promoter is a non-naturally occurringpromoter.
 15. The nucleic acid of claim 1 or 3, wherein the promotercomprises a nucleotide sequence having 80% or more sequence identitywith the nucleotide sequence: GTTAA (n)₃₋₇ GTTAA (n)₃₆₋₃₈TA (n)₂ TTTG(SEQ ID NO: 400).
 16. The nucleic acid of claim 1 or 3, wherein thepromoter comprises the nucleotide sequence set forth in any of SEQ IDNOs: 381-388.
 17. The nucleic acid of claim 1 or 3, wherein thenucleotide sequence of interest comprises a transgene sequence thatencodes a protein.
 18. The nucleic acid of claim 17, wherein the proteinencoded by the transgene sequence comprises a reporter protein, aselectable marker protein, a metabolic enzyme, or a therapeutic protein.19. The nucleic acid of claim 17, wherein the protein encoded by thetransgene sequence is a fusion protein comprising a cleavable linker anda secreted Bacteroides polypeptide fused to a heterologous polypeptideof interest, wherein the cleavable linker is positioned between thesecreted Bacteroides polypeptide and the polypeptide of interest. 20.The nucleic acid of claim 1 or 3, wherein the nucleotide sequence ofinterest comprises a transgene sequence that encodes a non-coding RNA.21. A prokaryotic cell comprising the nucleic acid of any of claims1-20.
 22. The prokaryotic cell of claim 21, wherein the nucleic acid isintegrated into a chromosome of the prokaryotic cell.
 23. Theprokaryotic cell of any of claims 21-22, wherein the prokaryotic cell isa Bacteroides cell.
 24. The prokaryotic cell of any of claims 21-22,wherein the prokaryotic cell is not a Bacteroides cell.
 25. Theprokaryotic cell of claim 24, wherein the prokaryotic cell is an E. colicell.
 26. A method of expressing a nucleic acid in a prokaryotic cell,the method comprising: introducing the nucleic acid of any of claims1-20 into the prokaryotic cell.
 27. The method of claim 26, wherein theprokaryotic cell is a Bacteroides cell.
 28. The method of claim 27,wherein the Bacteroides cell is a cell of a species selected from: B.fragilis (Bf), B. distasonis (Bd), B. thetaiotaomicron (Bt), B. vulgatus(Bv), B. ovatus (Bo), B. eggerrthii (Be), B. merdae (Bm), B. stercoris(Bs), B. uniformis (Bu), and B. caccae (Bc).
 29. The method of claim 26,wherein the prokaryotic cell is an E. coli cell.
 30. The method of anyof claims 26-29, wherein the nucleotide sequence of interest is atransgene encoding a fusion protein comprising a cleavable linker and asecreted Bacteroides polypeptide fused to a heterologous polypeptide ofinterest, wherein the cleavable linker is positioned between thesecreted Bacteroides polypeptide and the polypeptide of interest.
 31. Afusion protein comprising: a secreted Bacteroides polypeptide fused to aheterologous polypeptide of interest.
 32. The fusion protein of claim31, wherein the secreted Bacteroides polypeptide comprises an amino acidsequence that has 80% or more sequence identity with an amino acidsequence set forth in any of SEQ ID NOs: 458-484.
 33. The fusion proteinof any of claims 31-32, comprising a cleavable linker positioned betweenthe secreted Bacteroides polypeptide and the polypeptide of interest.34. The fusion protein of claim 33, wherein the cleavable linker iscleavable by one or more gut proteases.
 35. The fusion protein of claim34, wherein the cleavable linker is cleavable by one or more gutproteases selected from: a trypsin, a chymotrypsin, and an elastase. 36.The fusion protein of claim 34, wherein the cleavable linker is setforth in any of SEQ ID NOs: 420-453.
 37. The fusion protein of any ofclaims 31-36, wherein the polypeptide of interest is ananti-inflammatory peptide.
 38. The fusion protein of claim 37, whereinthe anti-inflammatory peptide comprises an amino acid sequence set forthin any of SEQ ID NOs: 411-417.
 39. A nucleic acid encoding the fusionprotein of any of claims 31-38.
 40. The nucleic acid of claim 39,wherein the nucleic acid is a plasmid.
 41. The nucleic acid of claim 40,wherein the plasmid comprises an origin of replication that functions inprokaryotic cells other than Bacteroides cells, but does not function inBacteroides cells.
 42. An outer membrane vesicle, comprising the fusionprotein of any of claims 31-38.
 43. A method of delivering apolypeptide, comprising: recombinantly expressing the fusion protein ofany of claims 31-38 in a prokaryotic cell; and delivering the fusionprotein or the polypeptide of interest outside of the prokaryotic cell.44. The method of claim 43, further comprising delivering the fusionprotein or the polypeptide of interest to a gut.
 45. The method of claim43, further comprising packaging the fusion protein or the polypeptideof interest into an outer membrane vesicle.
 46. The method of claim 45,further comprising fusing the outer membrane vesicle with a cellmembrane of a second cell.
 47. The method of claim 43, furthercomprising delivering the fusion protein or the polypeptide of interestto a second cell.
 48. The method of claim 47, wherein the second cell isa eukaryotic cell.
 49. The method of claim 47, wherein the second cellis a mammalian cell.
 50. A method of delivering a protein to anindividual's gut, the method comprising: introducing, into anindividual's gut, a Bacteroides cell comprising the nucleic acid of anyone of claims 1-20 and 39-41.
 51. The method of claim 50, wherein theindividual has a disease impacted by gut microbiota.
 52. The method ofclaim 50, wherein the individual has a disease selected from: obesity,diabetes, heart disease, central nervous system diseases, rheumatoidarthritis, metabolic disorders, and cancer.
 53. The method of claim 50,wherein the individual has gut inflammation.
 54. The method of claim 50,wherein the individual has colitis.
 55. The method of any of claims50-54, wherein the Bacteroides cell is a cell of a species selectedfrom: B. fragilis (Bf), B. distasonis (Bd), B. thetaiotaomicron (Bt), B.vulgatus (Bv), B. ovatus (Bo), B. eggerrthii (Be), B. merdae (Bm), B.stercoris (Bs), B. uniformis (Bu), and B. caccae (Bc).